FakeCovid Fact-Checked News Dataset
International Coverage of COVID-19 in 40 Languages from 105 Countries
By [source]
About this dataset
The FakeCovid dataset is an unparalleled compilation of 7623 fact-checked news articles related to COVID-19. Obtained from 92 fact-checking websites located in 105 countries, this comprehensive collection covers a wide range of sources and languages, including locations across Africa, Europe, Asia, The Americas and Oceania. With data gathered from references on Poynter and Snopes, this unique dataset is an invaluable resource for researching the accuracy of global news related to the pandemic. It offers an invaluable insight into the international nature of COVID information with its column headers covering country's involved; categories such as coronavirus health updates or political interference during coronavirus; URLs for referenced articles; verifiers employed by websites; article classes that can range from true to false or even mixed evaluations; publication dates ; article sources injected with credibility verification as well as article text and language standardization. This one-of-a kind dataset serves as an essential tool in understanding both global information flow around the world concerning COVID 19 while simultaneously offering transparency into whose interests guide it
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
The FakeCovid dataset is a multilingual cross-domain collection of 7623 fact-checked news articles related to COVID-19. It is collected from 92 fact-checking websites and covers a wide range of sources and countries, including locations in Africa, Asia, Europe, The Americas, and Oceania. This dataset can be used for research related to understanding the truth and accuracy of news sources related to COVID-19 in different countries and languages.
To use this dataset effectively, you will need basic knowledge of data science principles such as data manipulation with pandas or Python libraries such as NumPy or ScikitLearn. The data is in CSV (comma separated values) format that can be read by most spreadsheet applications or text editor like Notepad++.
Here are some steps on how to get started:
- Access the FakeCovid Fact Checked News Dataset from Kaggle: https://www.kaggle.com/c/fakecovidfactcheckednewsdataset/data
- Download the provided CSV file containing all fact checked news articles and place it into your desired folder location
- Load the CSV file into your preferred software application like Jupyter Notebook or RStudio 4)Explore your dataset using built-in functions within data science libraries such as Pandas & matplotlib – find meaningful information through statistical analysis &//or create visualizations 5)Modify parameters within the csv file if required & save 6)Share your creative projects through Gitter chatroom #fakecovidauthors 7 )Publish any interesting discoveries you find within open source repositories like GitHub 8 )Engage with our Hangouts group #FakeCoviDFactCheckersClub 9 )Show off fun graphics via Twitter hashtag #FakeCovidiauthors 10 )Reach out if you have further questions via email contactfakecovidadatateam 11 )Stay connected by joining our mailing list#FakeCoviDAuthorsGroup
We hope this guide helps you better understand how to use our FakeCoviD Fact Checked News Dataset for generating meaningful insights relating to COVID-19 news articles worldwide!
Research Ideas
- Developing an automated algorithm to detect fake news related to COVID-19 by leveraging the fact-checking flags and other results included in this dataset for machine learning and natural language processing tasks.
- Training a sentiment analysis model on the data to categorize articles according to their sentiments which can be used for further investigations into why certain news topics or countries have certain outcomes, motivations, or behaviors due to their content relatedness or author biasness(if any).
- Using unsupervised clustering techniques, this dataset could be used as a tool for identifying any discrepancies between news circulated in different populations in different countries (langauge and regions) so that publicists can focus more on providing factual information rather than spreading false rumors or misinformation about the pandemic
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: FakeCovid_July2020.csv
Column name |
Description |
ref_category_title |
The title of the category the article belongs to. (String) |
ref_url |
The URL of the article. (String) |
verifiedby |
The name of the fact checking website that verified it. (String) |
country |
The country the article is from. (String) |
class |
Whether the article is true, false, or mixed. (String) |
published_date |
The date the article was published. (Date) |
country1 |
The first country the article is related to. (String) |
country2 |
The second country the article is related to. (String) |
country3 |
The third country the article is related to. (String) |
country4 |
The fourth country the article is related to. (String) |
article_source |
The source of the article. (String) |
ref_source |
The source of the reference. (String) |
source_title |
The title of the source. (String) |
content_text |
The text of the article. (String) |
lang |
The language of the article. (String) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .