Archive/Prediction of Soil Total Nitrogen Through Vis–NIR Spectroscopy and Machine Learning: From Model Comparison to Explainability
Prediction of Soil Total Nitrogen Through Vis–NIR Spectroscopy and Machine Learning: From Model Comparison to Explainability
Shengchang Huai, Qingyue Zhang, Yuwen Jin et al.
20. Mai 2026
en

Abstract

Rapid and cost-effective estimation of soil total nitrogen (TN) is essential for soil fertility assessment and nutrient management. However, the performance of laboratory visible–near-infrared (Vis–NIR) models is shaped not only by preprocessing and modeling strategy but also by sample preparation and the soil’s compositional background. In this study, TN prediction was evaluated using 376 topsoil samples from two contrasting datasets: Mollisols from the black-soil region of Northeast China and Ultisols from Qiyang County, Hunan Province, southern China. Spectra acquired over 350–2500 nm for three particle-size fractions were preprocessed using Savitzky–Golay smoothing combined with standard normal variate (SNV), first-derivative, or second-derivative transformations, and modeled using partial least squares regression (PLSR), support vector regression (SVR), and extreme gradient boosting (XGBoost). Model development used a 5 × 5 nested cross-validation followed by evaluation on a sample-grouped held-out test set. Among all combinations, XGBoost with first-derivative preprocessing on the 0.25 mm fraction produced the best performance, with test R2 values of 0.91 for Mollisol and 0.78 for Ultisol. Shapley additive explanations (SHAP) and principal component analysis (PCA) consistently identified informative spectral regions at 430–480 and 1330–1450 nm for Mollisol and at 585–635, 820–900, and 2180–2240 nm for Ultisol. Prediction errors were larger in the sampled Ultisol dataset and increased with DCB-extractable Fe and mineral backgrounds. A second-stage log-domain residual correction incorporating ancillary soil properties further reduced the Ultisol RMSE from 0.30 to 0.27 g kg−1. These findings support the 0.25 mm, first-derivative, XGBoost workflow as a robust laboratory Vis–NIR approach for TN prediction and indicate that composition-aware residual correction can improve prediction in oxide- and mineral-rich soils.

IPC Classification

G06A01

Keywords

predictionsoiltotalnitrogenthroughspectroscopymachinelearningmodelcomparisonexplainabilitysystemsrapidcost-effectiveestimationessentialfertilityassessmentnutrientmanagementhoweverperformancelaboratoryvisible
Diese Veröffentlichung zitieren

€ 4.00