Data Wrangling!

Emmanuel Olaosebikan
4 min readJun 12, 2021

All Space Missions from 1957

This piece of article is basically about space missions right from 1957. Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consider it, with time, to be part of a boundless four-dimensional continuum known as spacetime. — Wikipedia. The Space is said to be outside our atmosphere at which is isn’t perfectly void but filled with other particles.

Congestion in the sky.

This Space began to be explored by humans which started in October 4, 1957, when the Union of Soviet Socialist Republics (U.S.S.R.) launched Sputnik, the first artificial satellite to orbit Earth. This happened during the period of political hostility between the Soviet Union and the United States known as the Cold War.

Ever Since then, different Companies from different Countries Globally also began to explore Space and moving on with their missions. But for every success, failure would have existed. A data set was provided by Agirlcoding, Consultant at Deloitte, Düsseldorf, North Rhine-Westphalia, Germany, which provided all the data of the space missions right from the beginning of Space Missions. This data set can be gotten here

Before Visualization and Analysis is done on a data set, Wrangling is done first which is a process of cleaning up a data set. To perform this operation, Python, a programming language will be used with its library specifically for this process, ‘pandas’ which will be imported below, and when that is done the data set is going to be read into the program called ‘read_csv’, the data set we’re using is a CSV file (Comma-separated values). Then afterwards, the first 5 rows gets printed.

import pandas as pd
df = pd.read_csv('Space_Corrected.csv')
df.head()
The head

So we can see obviously that our data looks somehow messy, like the first two columns which names are not names and seems to be same values. so the next thing to do is to check he information of the data set by using the pre-inbuilt function in pandas called ‘info()’ to check its information.

df.info()
The Information of the data set

Okay! from the information above, we have 4324 columns and 9 columns in total. So we would want to check if the first two columns are equal and if they are, we remove them by using the drop functions and passing in some props at which the first was defined to the the columns at which we want to remove.

if df['Unnamed: 0.1'].unique().all() == df['Unnamed: 0'].unique().all():
items = ['Unnamed: 0.1', 'Unnamed: 0']
df.drop(items, axis=1, inplace=True)
df

Basically we can see from the information above that we have 964 non-null values out of 4324 in the rocket column, then we will want to check the sum of all null values in the data set.

df.isnull().sum()
Sum of null values

Okayy! we’ve 3360 null values and we won’t want to lose such amount of data set by using the ‘dropna’ function but instead replace it with a value at which can fit in, and that value is ‘0.0’

df['Rocket'].fillna('0.0', inplace=True)
df.isnull().sum()

Okay, now we don’t have any null value and that’s way half clean without checking if there are duplicates. so we check if there are duplicates, and if there is, definitely, we want to clean that up by dropping it off.

df.duplicated().sum()df.drop_duplicates(inplace=True)
df.duplicated().sum()
Duplicates in data set

Now that being done and then, our data set is ready for Visualization and Analysis. Here is the Link to the Github repo.

Thanks for reading! follow up for the Visualization and Analysis of this data.

--

--

Emmanuel Olaosebikan

A student of Federal University of Agriculture Abeokuta, skilled with Web development, Graphics Design and Python Programming.