PANACEA dataset - Heterogeneous
COVID-19 Claims
This dataset contains a heterogeneous set of True and False COVID claims and online
sources of information for each claim. The claims have been obtained from online
sources, existing datasets and research challenges. It combines different data sources with
different foci, thus enabling a comprehensive approach that combines different media
Facebook, general websites, academia), information domains (health, scholar, media),
types (news, claims) and applications (information retrieval, veracity evaluation). The
dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709
claims (477
False and 1,232 True). More details in the previous links.
Please cite the following article when using the dataset:
Natural Language Inference with Self-Attention for
Veracity Assessment of Pandemic Claims,
2022 Annual Conference of the North American Chapter of the Association for Computational
Linguistics (NAACL), Jul. 2022.