Multi-Stage Evaluation Framework for Identifying Deployment-Ready Prediabetes Prediction Models

Archive/Multi-Stage Evaluation Framework for Identifying Deployment-Ready Prediabetes Prediction Models

Michael Sher, Milan Toma

July 3, 2026

Abstract

Selecting machine learning algorithms for clinical deployment demands comprehensive evaluation beyond conventional performance metrics. While automated frameworks simplify model generation, identifying algorithms suitable for real-world medical applications requires systematic assessment of learning dynamics, generalization stability, and cross-subset reliability. This study addresses prediabetes prediction through a multi-stage evaluation comparing automated machine learning frameworks, neural networks, gradient boosting implementations (XGBoost, CatBoost, LightGBM), and specialized imbalance-handling techniques. A questionnaire-based dataset with a substantial class imbalance was analyzed through progressive evaluation stages: aggregate performance metrics, learning curve analysis, minority class detection capability, and cross-subset generalization stability. Linear Discriminant Analysis achieved maximum validation metrics in automated screening but exhibited flat learning curves, indicating an exhausted learning capacity. XGBoost demonstrated optimal convergence dynamics with the highest validation performance (0.749 AUC), yet suffered substantial validation-to-test degradation (5.9 percentage points). CatBoost, despite inferior validation performance (0.696 accuracy), exhibited exceptional cross-subset stability with minimal performance decline (0.2 percentage points) while achieving a comparable test accuracy (0.694). CatBoost was selected for deployment based on its superior generalization stability, demonstrating that a multi-dimensional evaluation spanning aggregate metrics, learning dynamics, and cross-subset stability is essential for identifying clinically deployable models, as validation performance alone provides insufficient evidence for real-world applicability.

Metadata

DOI: 10.3390/make8070192 CC BY 4.0 license

IPC Classification

G06H04A61

Keywords

multi-stageevaluationframeworkidentifyingdeployment-readyprediabetespredictionmodelsmachinelearningknowledgeextractionselectingalgorithmsclinicaldeploymentdemandscomprehensivebeyondconventionalperformancemetricswhileautomated

Reference this publication

€ 4.00

← Back to Archive