Data Wrangling : All Weekly Excess Deaths.

Emmanuel Olaosebikan
4 min readJun 5, 2021
Image Source : AP IMAGE

There are a lot of deaths occurring almost everyday in each country which is caused by accidents, poverty, diseases (or virus), etc. These deaths were taken down into data from some countries so as to understand what causes the deaths and the amount of deaths per week. Now this data is going to go through a process called “Data Wrangling” to clean the data up for its analysis. For this process, I used a python library called “Pandas” which is used majorly in “Data Science” to access files and also perform different operations with it.

So I start by importing as ‘pd’ as an alias to save the stress of typing it — pandas — in full.

#importing necessary modules
import pandas as pd

Then the next thing is to open the required file using the built in function in pandas which is “read_csv()” to open the file — specifically csv file(comma ) — to begin the wrangling operation with it, then print the first 5 rows.

#loading the file
weekly_deaths_df = pd.read_csv(‘weekly_deaths.csv’)
weekly_deaths_df.head()
Head of the data set

The next step is to check if we have a Null value in any of the column by using the “isnull()” function.

#Check for null rows
weekly_deaths_df.isnull().sum()
Sum of all null values

So since there is no Null value in the data set there is no need of calling the “dropna()” method which is used to drop the row which includes a Null value. If you observe the data, the region and country has the same values, so i’d just remove the ‘region’ column from the whole data.

#drop the region column
weekly_deaths_df.drop(‘region’,axis=1, inplace=True)
weekly_deaths_df.head()
The head of the data set

Now to check the whole information about the data set, I se the ‘info()’ function.

#Check the file info
weekly_deaths_df.info()

Then also check the description of the whole data using the ‘describe()’ function.

#check the file description
weekly_deaths_df.describe()
The description of the data set

Since we’re already going through cleaning of the data and getting it ready for analysis, we would want to check if there are any duplicates in the data cells and if there s, we’d drop them out. Now we check that by using the ‘duplicated()’ method to check if there are duplicates then the ‘drop_duplicates()’ to drop them. But first of all, we check if duplication is true and if not there is no need to drop duplicates.

#check for duplicates
weekly_deaths_df.dulicated().sum()
Sum of Duplicates

The next thing to do is to check the unique properties of all the column titles in the data set. And that is done by using the ‘nunique()’ or ‘unique()’ method.

The Unique values of the data set

From the unique properties, we’ve just two years which is 2020 and 2021 obviously from the dataset. So the records was from 2020–2021. From the data as well, it was observed that the rate at which people die due to the SARS-COV19 reduced in 2021 because different suppressing drugs came up which could almost cure the virus.

The data set

This is not all about the weekly excess deaths in the specified countries from the dataset. From the data set, we could get the number of countries which was ‘45’ and it was gotten by using the code below:

#check for the number of countries
weekly_deaths_df.country.nunique()
The number of countries present in the data set

So further analysis would be done and uploaded very soon which would be majorly focused on ‘Data Visualization’ . The codes used for this piece can be gotten here Link located in my GitHub profile. Thanks For Reading!

--

--

Emmanuel Olaosebikan

A student of Federal University of Agriculture Abeokuta, skilled with Web development, Graphics Design and Python Programming.