Abstract
When some treatment labels are missing, the number of the observed treated group is much smaller than the control group. The imbalanced dataset can make treatment effect estimation more challenging. To address this issue, we develop an inference framework that integrates an Absent Data Generating (ADG) algorithm with kernel matching. The proposed method incorporates a modified Kernel Fisher Discriminant (KFD) into the causal inference framework to recover missing treatment assignments. SMOTE and Borderline-SMOTE are mainly designed for imbalanced classification problems. These methods generate synthetic observations through local interpolation. In contrast, the ADG algorithm learns the main patterns in the observed treated units. The algorithm then generates the unobserved samples by updating the data structure step by step and drawing samples based on the learned relationships. Consequently, it provides a distribution-based mechanism for recovering missing treatment information rather than relying solely on neighborhood-based interpolation. The recovered treatment labels are subsequently incorporated into kernel matching to improve ATT estimation and interval inference. Monte Carlo simulations and a real-data application demonstrate that the proposed method provides more accurate ATT estimates and more reliable confidence intervals under severe treatment-group imbalance.
IPC Classification
Keywords
€ 4.00