Machine Learning Integration of Clinical and Molecular Biomarkers to Predict Vascular Complications in Type 2 Diabetes

Archive/Machine Learning Integration of Clinical and Molecular Biomarkers to Predict Vascular Complications in Type 2 Diabetes

Gerardo García-Gil, Víctor Manuel Medina-Pérez, Joaquín Becerra-Contreras et al.

30 juin 2026

Abstract

Background/Objectives: Type 2 diabetes mellitus (T2DM) is a major global health challenge due to its high prevalence and association with chronic complications, highlighting the need for reliable predictive tools to support clinical decision-making. Methods: This study proposes a two-stage hierarchical prediction system based on a Random Forest (RF) classifier. In Stage 1, the model performs multiclass classification into healthy (H), T2DM without complications (D), and T2DM with complications (C). In Stage 2, patients classified as C are further stratified into microvascular or macrovascular complications. The dataset included 31 biochemical, molecular, inflammatory, and oxidative stress variables from Mexican and Spanish cohorts. Feature selection was performed using Pearson correlation, and feature relevance was further assessed using RF importance measures. Model training used stratified cross-validation, with additional evaluation on a hold-out set to approximate real-world performance. Results: The optimized RF achieved an accuracy of 92% and a macro F1-score of 0.92, outperforming baseline models, with an AUC-ROC of 0.89 for complication prediction. Key predictive features included IL-18, miR-126, duration of T2DM, HbA1c, and IL-10. Conclusions: The novelty of this study lies in integrating heterogeneous biomarkers within a hierarchical predictive framework, rather than in the machine learning algorithm itself. This multimodal approach, combined with interpretable machine learning techniques, is designed to deliver clinically meaningful insights for patient stratification and personalized management in T2DM.

Metadata

DOI: 10.3390/diagnostics16132040 CC BY 4.0 license

IPC Classification

G06A61C07

Keywords

machinelearningintegrationclinicalmolecularbiomarkerspredictvascularcomplicationstypediabetesdiagnosticsbackgroundobjectivesmellitust2dmmajorglobalhealthchallengehighprevalenceassociationchronic

Citer cette publication

€ 4.00

← Back to Archive