PLM-IL4

Introduction

Introduction Image

Despite advancements in antiviral drug and vaccine development, infections remain a major concern. Interleukin-4 (IL-4) is crucial in immune regulation and allergic responses. This study aims to enhance prediction accuracy by addressing data imbalance and improving feature extraction techniques. The proposed method includes using Edited Nearest Neighbors (ENN) and Synthetic Minority Over-sampling Technique (SMOTE) for data preprocessing to balance biomedical datasets, thereby enhancing model robustness and prediction accuracy. A 30-layer ESM-2 model is employed for feature extraction to capture deep-level information, which is then input into a Gated Recurrent Unit (GRU) model for prediction. Hyperparameter tuning and learning rate schedulers are utilized to optimize model performance further. The results demonstrate significant improvements in prediction accuracy, with the proposed method achieving an AUC of 0.98 and an accuracy of 93.1%, validating the method's effectiveness and supporting future immunotherapy and vaccine development.