Baselight

Covid-19 Global Dataset

Up-to-date numbers of daily Confirmed, Death and Active cases for 218 countries

@kaggle.josephassaker_covid19_global_dataset

About this Dataset

Covid-19 Global Dataset

For the latest analysis and visualizations of the COVID-19 pandemic, check out my constantly updated EDA notebook here 📈.


Context

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the strain of coronavirus that causes coronavirus disease 2019 (COVID-19), the respiratory illness responsible for the COVID-19 pandemic.

Since its first identification in December 2019 in Wuhan, China, this virus has taken the world by storm. Some people prefer to look at the positive side of things and how this pandemic has brought forward several positive changes. However, the collateral damages produced by this pandemic cannot be overlooked. From the Economic impact to Mental Health impacts, this pandemic period will arguably be one of the hardest periods we'll encounter in our lives.
That being said, we always have to arm ourselves with hope. With the new advancements in the vaccine studies, let's hope to wake up from this nightmare as soon as possible.

“Hope is being able to see that there is light despite all of the darkness.” – Desmond Tutu

As for the reason for me building this dataset, it's because I couldn't get my hands on an easily digestible and up-to-date dataset of Covid-19, so, I decided to build my own using Python and web scraping techniques.
I will also update this dataset as frequently as possible!

Content

This data was scraped from woldometers.info on 2022-05-14 by Joseph Assaker.

225 countries are represented in this data.

All of countries have records dating from 2020-2-15 until 2022-05-14 (820 days per country).
That's with the exception of China, which has records dating from 2020-1-22 until 2022-05-14 (844 days per country), and Palau which has records dating from 2021-8-25 until 2022-05-14 (263 days per country)..

Summary Data Columns Description:

  • country: designates the Country in which the the row's data was observed.
  • continent: designates the Continent of the observed country.
  • total_confirmed: designates the total number of confirmed cases in the observed country.
  • total_deaths: designates the total number of confirmed deaths in the observed country.
  • total_recovered: designates the total number of confirmed recoveries in the observed country.
  • active_cases: designates the number of active cases in the observed country.
  • serious_or_critical: designates the estimated number of cases in serious or critical conditions in the observed country.
  • total_cases_per_1m_population: designates the number of total cases per 1 million population in the observed country.
  • total_deaths_per_1m_population: designates the number of total deaths per 1 million population in the observed country.
  • total_tests: designates the number of total tests done in the observed country.
  • total_tests_per_1m_population: designates the number of total test done per 1 million population in the observed country.
  • population: designates the population count in the observed country.

Daily Data Columns Description:

  • date: designates the date of observation of the row's data in YYYY-MM-DD format.
  • country: designates the Country in which the the row's data was observed.
  • cumulative_total_cases: designates the cumulative number of confirmed cases as of the row's date, for the row's country.
  • daily_new_cases: designates the daily new number of confirmed cases on the row's date, for the row's country.
  • active_cases: designates the number of active cases (i.e., confirmed cases that still didn't recover nor die) on the row's date, for the row's country.
  • cumulative_total_deaths: designates the cumulative number of confirmed deaths as of the row's date, for the row's country.
  • daily_new_deaths: designates the daily new number of confirmed deaths on the row's date, for the row's country.

Acknowledgements

As previously mentioned, all the data present in this dataset is scraped from worldometers.info.

Inspiration

Going through this data, Kagglers can visualize various trends in their own country, or compare several countries.
One can also combine this dataset with other news and key points in time (lockdowns, new UK mutation, Holidays, etc.) in order to study the effects of these events on the progression of Covid-19 in a multitude of countries.
Implementing time series analysis on this dataset would also be an amazing idea! Getting a deep learning algorithm to learn from this sea of data and try to predict the future turn of events could be quite interesting!

Share link

Anyone who has the link will be able to view this.