Name: Covid 19 Dataset
Creator: Kaggle
License: http://opendatacommons.org/licenses/dbcl/1.0/

About this Dataset

Covid 19 Dataset

📹Project Video available on YouTube - https://youtu.be/89eYAAPyRfo

This dataset contains global records of COVID-19 cases reported on April 29, 2020. It includes data for multiple countries and regions, showing the number of confirmed cases, deaths, and recoveries due to the coronavirus. The dataset is useful for analyzing the impact of the pandemic across different regions and can be used for visualization, comparison, or statistical modeling.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

Using this dataset, we answered multiple questions with Python in our Project.

Q.1) Show the number of Confirmed, Deaths and Recovered cases in each Region.

Q.2) Remove all the records where the Confirmed Cases is Less Than 10.

Q.3) In which Region, maximum number of Confirmed cases were recorded ?

Q.4) In which Region, minimum number of Deaths cases were recorded ?

Q.5) How many Confirmed, Deaths & Recovered cases were reported from India till 29 April 2020 ?

Q.6-A ) Sort the entire data wrt No. of Confirmed cases in ascending order.

Q.6-B ) Sort the entire data wrt No. of Recovered cases in descending order.

These are the main Features/Columns available in the dataset :

Date : Represents the date when the data was recorded. In this dataset, all records are from April 29, 2020.
Region : The name of the country or territory for which COVID-19 data is recorded.
Confirmed : The total number of confirmed COVID-19 cases reported in that region as of April 29, 2020.
Deaths : The total number of deaths attributed to COVID-19 in that region as of April 29, 2020.
Recovered : The total number of individuals who recovered from COVID-19 in that region by the recorded date.

The commands that we used in this project :

import pandas as pd -- To import Pandas library
pd.read_csv - To import the CSV file in Jupyter notebook
df.count() - It counts the no. of non-null values of each column.
df.isnull().sum() - It detects the missing values from the dataframe.
import seaborn as sns - To import the Seaborn library.
import matplotlib.pyplot as plt - To import the Matplotlib library.
sns.heatmap(df.isnull()) - It will show the all columns & missing values in them in heat map form.
plt.show() - To show the plot.
df.groupby(‘Col_name’) - To form groups of all unique values of the column.
df.sort_values(by= ['Col_name'] ) - Sort the entire dataframe by the values of the given column.
df[df.Col_1 = = ‘Element1’] - Filtering – We are accessing all records with Element1 only of Col_1.

Tables

Covid 19 Data

@kaggle.rohitgrewal_covide_19_dataset.covid_19_data

12.38 kB
321 rows
6 columns

CREATE TABLE covid_19_data (
  "date" TIMESTAMP,
  "state" VARCHAR,
  "region" VARCHAR,
  "confirmed" BIGINT,
  "deaths" BIGINT,
  "recovered" BIGINT
);