Machine Learning Applications with Sensors for Indoor Air Quality Research

Archive/Machine Learning Applications with Sensors for Indoor Air Quality Research

Cosmina-Mihaela Rosca, Adrian Stancu

6 de mayo de 2026

Abstract

Nowadays, people spend over 80% of their lives indoors, which makes indoor air quality (IAQ) research important. The paper presents, firstly, a structured overview of publicly available IAQ datasets suitable for machine learning (ML) research, secondly, a comparative analysis of the reviewed datasets, thirdly, an ML-oriented mapping between tasks and algorithms, to outline the algorithmic families that are most appropriate given the dataset structure and the prediction target, and fourthly, an investigation on IAQ–ML using custom-made solutions that include sensors for data acquisition. The methodology included an analysis of 1162 papers from the Web of Science, 1536 from Scopus, and 756 from IEEE Xplore, between 1 January 2020 and 31 December 2025, to capture recent trends in ML-based IAQ research. The findings show that linear regression (132 articles), Logistic regression (91), random forest—RF (77), Long Short-Term Memory—LSTM (77), Principal Component Analysis (63), and Elastic Net are the most popular among researchers. Most studies report accuracy over 90%, with maximum values of 99.37% for LSTM and 99.20% for RF. In the case of regression, the R2 values range between 82% and 98%, especially for CO2 and PM2.5 prediction. eXtreme Gradient Boosting or hybrid RF-LSTM architectures achieve R2 values of up to 99%. The IAQ public and private datasets analyzed for this study provide a strong foundation for transfer learning, but differences require careful preprocessing to ensure consistent comparisons and reliable conclusions. The distribution of articles by sensor type for IAQ parameters shows that linear regression remains the most widely used ML method (26 studies), followed by LSTM (19) and RF (18). The research results confirm that there is no universal algorithm for IAQ, and the quality and structure of the data contribute to the success of ML models. This study aims to be a foundation for the development of future intelligent IAQ monitoring systems.

Metadata

DOI: 10.3390/s26092909 CC BY 4.0 license

IPC Classification

G06

Keywords

machinelearningapplicationssensorsindoorqualityresearchnowadayspeoplespendlivesindoorswhichmakesimportantpaperpresentsfirstlystructuredoverviewpubliclyavailabledatasetssuitable

Citar esta publicación

€ 4.00

← Back to Archive