Country Data On COVID-19
COVID-19 dataset up to 21/02/2023
@kaggle.carlaoliveira_country_data_on_covid19
COVID-19 dataset up to 21/02/2023
@kaggle.carlaoliveira_country_data_on_covid19
The data is in CSV format and includes all historical data on the pandemic up to 03/01/2023, following a 1-line format per country and date.
In the pre-processing of these data, missing data were checked. It was observed, for example, that the missing data referring to new_cases was where the total number of cases had not been changed and that most of the missing data related to vaccination, which actually at the beginning of the pandemic there was no data. Therefore, to solve these cases of missing data it was decided to replace the data containing “NaN” by zero.
Some of these features were combined to generate new features. This process that creates new features (data) from existing data, aiming to improve the data before applying machine learning algorithms, is called feature engineering. The new features created were:
CREATE TABLE df_covid19_countries (
"location" VARCHAR,
"date" TIMESTAMP,
"total_cases" DOUBLE,
"new_cases" DOUBLE,
"new_cases_smoothed" DOUBLE,
"total_deaths" DOUBLE,
"new_deaths" DOUBLE,
"new_deaths_smoothed" DOUBLE,
"reproduction_rate" DOUBLE,
"total_vaccinations" DOUBLE,
"people_vaccinated" DOUBLE,
"people_fully_vaccinated" DOUBLE,
"total_boosters" DOUBLE,
"population" DOUBLE,
"vaccination_ratio" DOUBLE,
"prevalence" DOUBLE,
"incidence" DOUBLE
);Anyone who has the link will be able to view this.