Archive/A Computational Analysis of Information Reduction in Academic Risk Prediction
A Computational Analysis of Information Reduction in Academic Risk Prediction
Orlando Arroyo, José M. Benjumea, Jesús Villalba
July 2, 2026
en

Abstract

Academic risk prediction often assumes that richer educational data yields better performance, yet many institutions operate with strictly limited information. This study quantifies how predictive behavior degrades when information is progressively constrained within the feature space. Using prerequisite-based academic records from three Civil Engineering courses at Universidad Industrial de Santander, Colombia, we evaluate nested information regimes ranging from a no-information baseline to a full ten-variable prerequisite representation. We assessed logistic regression models through repeated cross-validation, measuring continuous discrimination, probabilistic calibration, and threshold-dependent classification. Additional paired and chronology-controlled robustness analyses evaluated whether the reduced and full regimes differed materially across folds, nonlinear learners, first-attempt cohorts, and target-year blocks. The results demonstrate predictive compressibility within prerequisite-only records: transitioning from the baseline to a one-dimensional model (the average prerequisite grade) drives the largest performance gain, while subsequent feature additions yield rapidly diminishing returns. Despite a 90% reduction in dimensionality, this sparse model retains between 98.0% and 100.1% of the full-model ROC-AUC and maintains stable probabilistic calibration. These findings expose substantial structural redundancy within broader prerequisite representations, suggesting that sparse, interpretable academic summaries capture most of the measurable prerequisite-based predictive signal. Ultimately, academic risk estimation remains feasible under severe data constraints, although the moderate absolute ROC-AUC values indicate that these models are best interpreted as transparent triage tools rather than complete explanations of academic performance.

IPC Classification

G06C07B60

Keywords

computationalanalysisinformationreductionacademicriskpredictionmathematicalapplicationsoftenassumesrichereducationaldatayieldsbetterperformancemanyinstitutionsoperatestrictlylimitedquantifiespredictive
Reference this publication

€ 4.00