Random forest-based QSAR modeling for predicting the potency of neprilysin inhibitors using Mordred molecular descriptors

Authors

  • Nizam Albar Department of Computer Engineering, Faculty of Engineering, Universitas Serambi Mekkah, Banda Aceh, Indonesia https://orcid.org/0009-0009-0254-1920
  • Derren DCH. Rampengan Faculty of Medicine, Universitas Sam Ratulangi, Manado, Indonesia https://orcid.org/0009-0002-5482-0613
  • Saiful Azhari Department of Pharmacy, STIKES Assyifa Aceh, Banda Aceh, Indonesia
  • Mahmudi Mahmudi Department of Pharmacy, STIKES Assyifa Aceh, Banda Aceh, Indonesia
  • Farrah Fahdhienie Faculty of Public Health, Universitas Muhammadiyah Aceh, Banda Aceh, Indonesia https://orcid.org/0000-0002-6545-1772
  • Anggi Susilawati Faculty of Teacher Training and Education, Universitas Bina Bangsa Getsempena, Banda Aceh, Indonesia
  • Muhammad Habiburrahman Faculty of Medicine, Imperial College London, London, United Kingdom

DOI:

https://doi.org/10.52225/narrax.v4i1.242

Keywords:

Cardiology drug, drug discovery, heart failure, machine learning, quantitative structure-activity relationship

Abstract

Neprilysin (NEP) is a zinc-dependent metallopeptidase, considered a key therapeutic target in heart failure management. Efficient identification of potent NEP inhibitors remains a challenge in drug discovery. The aim of this study was to develop a quantitative structure–activity relationship (QSAR) model using 2D Mordred molecular descriptors and Random Forest algorithms to predict the inhibitory potency (pIC50) of drug candidates. A curated dataset of compounds with experimentally determined IC₅₀ values (in nM) against NEP was preprocessed and converted to pIC50. Mordred was used to calculate 2D molecular descriptors, and descriptors with missing values were excluded. The dataset was split into training, internal validation, and external test sets. A Random Forest regression model was trained using 500 estimators, and model performance was evaluated using R2, root mean square error (RMSE), mean absolute error (MAE), and concordance correlation coefficient (CCC), while a binary classification model was also constructed. Feature importance, residual analysis, and chemical space visualization were conducted to assess model interpretability and reliability. The regression model demonstrated moderate to strong predictive performance, with R2 of 0.286, RMSE of 0.949, MAE of 0.723, and CCC of 0.532 in the internal validation. External validation showed improved generalization, with R2=0.659, RMSE=0.858, MAE=0.630, and CCC=0.763. Binary classification revealed an accuracy of 0.953, precision of 1.000, recall of 0.943, and an F1-score of 0.971, indicating strong discriminative ability in classifying inhibitory versus non-inhibitory compounds. Top contributing descriptors included ATSC2p (feature importance=0.0505), GATS2p (0.0408), and SaasC (0.0317). Principal component analysis (PCA) and Williams plots confirmed that test compounds lie within the model’s applicability domain, with no major outliers in leverage or residual distribution. The developed Random Forest-based QSAR model demonstrates strong predictive power and interpretability for identifying NEP inhibitors. This study provides a valuable tool for virtual screening and highlights the relevance of 2D structural features in governing NEP inhibitory activity. It is the first dedicated QSAR analysis of neprilysin inhibition using Mordred descriptors with rigorous internal and external validation.

Downloads

Published

2026-04-01

Issue

Section

Original Article