Archive/Examining the Performance of Large Language Models in Health Information Quality Evaluation Tasks: An AI Post-Alignment Perspective
Examining the Performance of Large Language Models in Health Information Quality Evaluation Tasks: An AI Post-Alignment Perspective
Shenghui Zhang, Jun Zhang, Peng Li et al.
July 2, 2026
en

Abstract

When large language models (LLMs) are implemented in the field of health information quality evaluation, ensuring that their outputs align with human judgment, logic, and strategic preferences has become a focal point of current research on AI alignment. This study proposes the Health Information Quality Evaluation Framework based on AI Post-Alignment (HIQE-PA) to examine whether large language models remain consistent with expert judgment standards in health information quality evaluation tasks. Using expert evaluation results as the benchmark, the framework assesses the consistency, stability, and cross-source adaptability of model outputs through structured task implementation and multidimensional statistical indicators. In the experimental study, ERNIE-3.5 and ChatGLM2-6B-32K were selected as general-purpose models, and the corresponding ERNIE-3.5 + HIQE-PA model and ChatGLM2-6B-32K + HIQE-PA model were constructed to compare post-alignment performance under different model conditions. Under three post-alignment performance evaluation indicators, including deviation degree, predictability, and fitness, the LLMs’ post-alignment performance was examined through their outputs in the health information quality evaluation task. The results show that the general LLMs achieved a high level of post-alignment performance only in the dimensions of readability and completeness, while the LLMs + HIQE-PA improved post-alignment performance across all dimensions. In particular, the ERNIE-3.5 + HIQE-PA model performed prominently, producing evaluation outputs that were closer to expert consensus and maintaining consistency with the expert benchmark across different text sources. This study demonstrates that post-alignment examinations can provide empirical support for model selection and governance in the health information domain.

Keywords

examiningperformancelargelanguagemodelshealthinformationqualityevaluationtaskspost-alignmentperspectivewhenllmsimplementedfieldensuringoutputsalignhumanjudgmentlogicstrategicpreferences
Reference this publication

€ 4.00