Abstract
Persistent traffic congestion and the need for efficient traffic monitoring have increased the demand for automated vehicle-analysis systems based on CCTV footage. This study presents a CCTV-based vehicle monitoring system that integrates vehicle detection, tracking, counting, public/private vehicle class prediction, seven-category vehicle-type prediction, vehicle-color recognition, and traffic-state estimation using YOLOv12 and DeepSORT. To reduce manual annotation effort during the initial training stage, a semi-automated method for generating synthetic composite road scenes was developed by combining cropped vehicle images and road-background images. The detector was first trained on 10,000 synthetic images and then sequentially fine-tuned on real CCTV data. Four real-world traffic video clips from Metro Manila were used in the study. Three 5 min clips were used within the staged refinement workflow: the first two for iterative refinement and the third for final post-refinement evaluation of the adapted model. A separate fourth CCTV clip was reserved exclusively for blind evaluation without on-the-fly retraining. The final system achieved average accuracies of 97% for public/private vehicle class prediction, 90% for seven-category vehicle-type prediction, 82% for vehicle-color recognition, and 96.67% for vehicle counting on the final evaluation video. The results show that synthetic pretraining combined with limited real-world fine-tuning can improve performance in CCTV-based vehicle monitoring while reducing the amount of manually labeled real-world data required. The study also discusses the limitations of the current evaluation protocol and the need for broader multi-location testing.
IPC Classification
Keywords
€ 4.00