Abstract
Background/Objectives: Type 2 diabetes mellitus (T2DM) is a major global health challenge due to its high prevalence and association with chronic complications, highlighting the need for reliable predictive tools to support clinical decision-making. Methods: This study proposes a two-stage hierarchical prediction system based on a Random Forest (RF) classifier. In Stage 1, the model performs multiclass classification into healthy (H), T2DM without complications (D), and T2DM with complications (C). In Stage 2, patients classified as C are further stratified into microvascular or macrovascular complications. The dataset included 31 biochemical, molecular, inflammatory, and oxidative stress variables from Mexican and Spanish cohorts. Feature selection was performed using Pearson correlation, and feature relevance was further assessed using RF importance measures. Model training used stratified cross-validation, with additional evaluation on a hold-out set to approximate real-world performance. Results: The optimized RF achieved an accuracy of 92% and a macro F1-score of 0.92, outperforming baseline models, with an AUC-ROC of 0.89 for complication prediction. Key predictive features included IL-18, miR-126, duration of T2DM, HbA1c, and IL-10. Conclusions: The novelty of this study lies in integrating heterogeneous biomarkers within a hierarchical predictive framework, rather than in the machine learning algorithm itself. This multimodal approach, combined with interpretable machine learning techniques, is designed to deliver clinically meaningful insights for patient stratification and personalized management in T2DM.
IPC Classification
Keywords
€ 4.00