Data Wrangling : All Weekly Excess Deaths.
There are a lot of deaths occurring almost everyday in each country which is caused by accidents, poverty, diseases (or virus), etc. These deaths were taken down into data from some countries so as to understand what causes the deaths and the amount of deaths per week. Now this data is going to go through a process called “Data Wrangling” to clean the data up for its analysis. For this process, I used a python library called “Pandas” which is used majorly in “Data Science” to access files and also perform different operations with it.
So I start by importing as ‘pd’ as an alias to save the stress of typing it — pandas — in full.
#importing necessary modules
import pandas as pd
Then the next thing is to open the required file using the built in function in pandas which is “read_csv()” to open the file — specifically csv file(comma ) — to begin the wrangling operation with it, then print the first 5 rows.
#loading the file
weekly_deaths_df = pd.read_csv(‘weekly_deaths.csv’)
weekly_deaths_df.head()
The next step is to check if we have a Null value in any of the column by using the “isnull()” function.
#Check for null rows
weekly_deaths_df.isnull().sum()
So since there is no Null value in the data set there is no need of calling the “dropna()” method which is used to drop the row which includes a Null value. If you observe the data, the region and country has the same values, so i’d just remove the ‘region’ column from the whole data.
#drop the region column
weekly_deaths_df.drop(‘region’,axis=1, inplace=True)
weekly_deaths_df.head()
Now to check the whole information about the data set, I se the ‘info()’ function.
#Check the file info
weekly_deaths_df.info()
Then also check the description of the whole data using the ‘describe()’ function.
#check the file description
weekly_deaths_df.describe()
Since we’re already going through cleaning of the data and getting it ready for analysis, we would want to check if there are any duplicates in the data cells and if there s, we’d drop them out. Now we check that by using the ‘duplicated()’ method to check if there are duplicates then the ‘drop_duplicates()’ to drop them. But first of all, we check if duplication is true and if not there is no need to drop duplicates.
#check for duplicates
weekly_deaths_df.dulicated().sum()
The next thing to do is to check the unique properties of all the column titles in the data set. And that is done by using the ‘nunique()’ or ‘unique()’ method.
From the unique properties, we’ve just two years which is 2020 and 2021 obviously from the dataset. So the records was from 2020–2021. From the data as well, it was observed that the rate at which people die due to the SARS-COV19 reduced in 2021 because different suppressing drugs came up which could almost cure the virus.
This is not all about the weekly excess deaths in the specified countries from the dataset. From the data set, we could get the number of countries which was ‘45’ and it was gotten by using the code below:
#check for the number of countries
weekly_deaths_df.country.nunique()
So further analysis would be done and uploaded very soon which would be majorly focused on ‘Data Visualization’ . The codes used for this piece can be gotten here Link located in my GitHub profile. Thanks For Reading!