Baselight

Country Data On COVID-19

COVID-19 dataset up to 21/02/2023

@kaggle.carlaoliveira_country_data_on_covid19

About this Dataset

Country Data On COVID-19

The data is in CSV format and includes all historical data on the pandemic up to 03/01/2023, following a 1-line format per country and date.

In the pre-processing of these data, missing data were checked. It was observed, for example, that the missing data referring to new_cases was where the total number of cases had not been changed and that most of the missing data related to vaccination, which actually at the beginning of the pandemic there was no data. Therefore, to solve these cases of missing data it was decided to replace the data containing “NaN” by zero.
Some of these features were combined to generate new features. This process that creates new features (data) from existing data, aiming to improve the data before applying machine learning algorithms, is called feature engineering. The new features created were:

  • Vaccination rate (vaccination_ratio'): total number of people who received at least one dose of vaccine divided by the population at risk. This dose number was chosen because it has a higher correlation with new deaths.
  • Prevalence: existing cases of the disease at a given time divided by the population at risk of having the disease. Formula: COVID-19 cases ÷ Population at risk * 100. Example: 168,331 ÷ 210,000,000 * 100 = 0.08.
  • Incidence: new cases of the disease in a defined population during a specific period (one day, for example) divided by the population at risk. Formula: New COVID-19 cases in one day ÷ Population - Total cases * 100. Example: 5,632 ÷ 209,837,301 * 100 = 0.0026.

Share link

Anyone who has the link will be able to view this.