Random forest-based QSAR modeling for predicting the potency of neprilysin inhibitors using Mordred molecular descriptors

Nizam Albar; Derren DCH. Rampengan; Saiful Azhari; Mahmudi Mahmudi; Farrah Fahdhienie; Anggi Susilawati; Muhammad Habiburrahman

doi:10.52225/narrax.v4i1.242

Authors

Nizam Albar Department of Computer Engineering, Faculty of Engineering, Universitas Serambi Mekkah, Banda Aceh, Indonesia https://orcid.org/0009-0009-0254-1920
Derren DCH. Rampengan Faculty of Medicine, Universitas Sam Ratulangi, Manado, Indonesia https://orcid.org/0009-0002-5482-0613
Saiful Azhari Department of Pharmacy, STIKES Assyifa Aceh, Banda Aceh, Indonesia
Mahmudi Mahmudi Department of Pharmacy, STIKES Assyifa Aceh, Banda Aceh, Indonesia
Farrah Fahdhienie Faculty of Public Health, Universitas Muhammadiyah Aceh, Banda Aceh, Indonesia https://orcid.org/0000-0002-6545-1772
Anggi Susilawati Faculty of Teacher Training and Education, Universitas Bina Bangsa Getsempena, Banda Aceh, Indonesia
Muhammad Habiburrahman Faculty of Medicine, Imperial College London, London, United Kingdom

DOI:

https://doi.org/10.52225/narrax.v4i1.242

Keywords:

Cardiology drug, drug discovery, heart failure, machine learning, quantitative structure-activity relationship

Abstract

Neprilysin (NEP) is a zinc-dependent metallopeptidase, considered a key therapeutic target in heart failure management. Efficient identification of potent NEP inhibitors remains a challenge in drug discovery. The aim of this study was to develop a quantitative structure–activity relationship (QSAR) model using 2D Mordred molecular descriptors and Random Forest algorithms to predict the inhibitory potency (pIC₅₀) of drug candidates. A curated dataset of compounds with experimentally determined IC₅₀ values (in nM) against NEP was preprocessed and converted to pIC₅₀. Mordred was used to calculate 2D molecular descriptors, and descriptors with missing values were excluded. The dataset was split into training, internal validation, and external test sets. A Random Forest regression model was trained using 500 estimators, and model performance was evaluated using R², root mean square error (RMSE), mean absolute error (MAE), and concordance correlation coefficient (CCC), while a binary classification model was also constructed. Feature importance, residual analysis, and chemical space visualization were conducted to assess model interpretability and reliability. The regression model demonstrated moderate to strong predictive performance, with R² of 0.286, RMSE of 0.949, MAE of 0.723, and CCC of 0.532 in the internal validation. External validation showed improved generalization, with R²=0.659, RMSE=0.858, MAE=0.630, and CCC=0.763. Binary classification revealed an accuracy of 0.953, precision of 1.000, recall of 0.943, and an F1-score of 0.971, indicating strong discriminative ability in classifying inhibitory versus non-inhibitory compounds. Top contributing descriptors included ATSC2p (feature importance=0.0505), GATS2p (0.0408), and SaasC (0.0317). Principal component analysis (PCA) and Williams plots confirmed that test compounds lie within the model’s applicability domain, with no major outliers in leverage or residual distribution. The developed Random Forest-based QSAR model demonstrates strong predictive power and interpretability for identifying NEP inhibitors. This study provides a valuable tool for virtual screening and highlights the relevance of 2D structural features in governing NEP inhibitory activity. It is the first dedicated QSAR analysis of neprilysin inhibition using Mordred descriptors with rigorous internal and external validation.

Downloads

Download data is not yet available.

Random forest-based QSAR modeling for predicting the potency of neprilysin inhibitors using Mordred molecular descriptors

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

quickmenu2

Quickmenu

statistics

Statistics

tools

Tools

Templates

Templates