Evaluating Automated Event Databases for Event Forecasting: A Comparative Analysis of GDELT and POLECAT

Archive/Evaluating Automated Event Databases for Event Forecasting: A Comparative Analysis of GDELT and POLECAT

Kelang Zhao, Zexin Fu, Yan Pan et al.

June 30, 2026

Abstract

To address the lack of systematic quantitative evaluation of automated event repositories in forecasting tasks, this study selected Global Database of Events, Language, and Tone (GDELT) and the emerging Political Event Classification, Attributes, and Types (POLECAT) dataset as research subjects, aiming to provide a basis for data source selection through multidimensional comparisons. The primary research question is how to establish a structured, quantitative framework to reliably evaluate these data sources in specific predictive contexts. This study constructed a quantitative framework covering scale, coverage, redundancy, and accuracy, and conducted empirical forecasting tests across multiple cities. The results indicate that while GDELT possesses a large-scale and high media coverage, it performs poorly in terms of redundancy and domain accuracy; although POLECAT is smaller in scale, it exhibits high domain identification accuracy and extremely low redundancy, with its forecast results demonstrating superior precision and false positive control capabilities. The conclusion is that GDELT is suitable for macro-level early warning scenarios requiring high recall, while POLECAT is better suited for tasks requiring high signal-to-noise ratio inputs and specific regional studies; the choice between the two should be based on a trade-off between model requirements and application scenarios.

Metadata

DOI: 10.3390/data11070158 CC BY 4.0 license

IPC Classification

G06

Keywords

evaluatingautomatedeventdatabasesforecastingcomparativeanalysisgdeltpolecatdataaddresslacksystematicquantitativeevaluationrepositoriestasksselectedglobaldatabaseeventslanguagetoneemerging

Reference this publication

€ 4.00

← Back to Archive