Abstract
Reliable osteosarcoma tissue characterization combines radiographic evidence of bone-level structural change with histopathological assessment of cellular morphology. In rare cancers, however, patient-matched multimodal datasets are rarely available because radiology and pathology follow separate clinical workflows and cohort sizes are small. This study examines whether decision-level fusion can integrate radiographs and histopathology images originating from independent, unpaired patient cohorts, and reports the results as a methodological proof of concept rather than as a clinically validated diagnostic system. Two EfficientNet-B0 encoders were trained separately using osteosarcoma-positive radiographs from the Kaggle Bone Tumor Classification dataset (180 images) and H&E-stained histopathology tiles from the TCIA Osteosarcoma Tumor Assessment collection (1144 tiles from four patients). Histopathology tiles carry three labels: non-tumor, viable tumor, and necrotic tumor. Because radiographs do not provide tissue-viability labels and cannot directly distinguish viable from necrotic tumor, the radiographic branch was used as a weak radiograph-derived probability prior mapped into the shared three-class decision space during fusion. Fusion operates only on modality-specific probability vectors; no case-level or patient-level pairing is assumed or required. The adaptive gating network estimates a per-sample radiograph-prior weight, α, from the concatenated vector, [Pr, Ph], where Pr denotes the radiograph-derived probability prior, Ph denotes the histopathology probability vector, and 1 − α denotes the histopathology weight. To limit leakage in the small histopathology cohort, the four patients were assigned to fixed training (P001 and P002), calibration (P003), and test (P004) partitions with strict patient-level separation. On the single held-out test patient (171 tiles), adaptive gating fusion classified 166 of 171 tiles correctly (97.08% accuracy, macro-F1 of 0.97, and macro-AUC of 0.99), compared with 161 of 171 tiles (94.15%) for fixed-α fusion at α = 0.25. McNemar’s test for this comparison gave χ2 = 3.20, p = 0.074, so the improvement was numerically higher but not statistically significant at the 0.05 level. Simpler classifiers on the same three-dimensional fused vector reached comparable accuracy (95.91–96.49%), and none differed significantly from adaptive gating. These results indicate that confidence-aware decision-level fusion is feasible under unpaired, data-constrained conditions, and that its present value lies in interpretable per-sample modality weighting rather than in a demonstrated accuracy advantage. The single-patient histopathology test set precludes any claim of clinical generalizability; validation on larger, multi-institutional, patient-level cohorts remains necessary.
IPC Classification
Keywords
€ 4.00