Thanks to theidioms.com

Learn Pandas for Data Science (Course IV)

Learn Pandas for Data Science (Course IV)

Inspecting data using Pandas

While working with data, it is very important to inspect the data. Knowing insights about data such as count, mean, standard deviations, min-max values, data type, etc can provide valuable information about the data we’re working with. Pandas provide easier methods to give basic insights about a DataFrame. In this chapter, you will learn about some of those methods for extracting the basic insights about a DataFrame.

For this chapter, we will be using the COVID-19 Dataset from Kaggle. You can simply download the data from this link and save the file as data.csv in the same folder where your Python Notebook is situated at. Then, you can simply load the data into your Python notebook as:

# Making necessary imports
import pandas as pd

# Loading the dataset
df = pd.read_csv("data.csv")

Note: This dataset gets updated frequently, so the values seen in this example may slightly vary when you try the dataset yourself. However, the processes still remain the same.

.

Display top n rows of a Pandas DataFrame

The pandas.DataFrame.head method is used to display the top n rows of the DataFrame.

# Display top 3 rows
df.head(n=3)
pandas dataframe head

If the number of rows (n) is not specified, the top 5 rows are displayed as default.

# Displays top 5 rows by default
df.head()
pandas dataframe head

Display bottom n rows of a Pandas DataFrame

The pandas.DataFrame.tail method is used to display the bottom n rows of the DataFrame. Similar to the pandas.DataFrame.head function, if no number is passed to it, it displays the bottom 5 rows of the DataFrame.

# Display bottom 5 rows
df.tail()
Pandas DataFrame display bottom rows

Display all the Column Names

Sometimes it may not be feasible to print the whole DataFrame in order to see the name of the columns present in the DataFrame, especially when there are a lot of columns. In such cases, we can use the pandas.DataFrame.columns method to extract all the column names.

# Display all the column names
df.columns
Index(['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Active',
'New cases', 'New deaths', 'New recovered', 'Deaths / 100 Cases',
'Recovered / 100 Cases', 'Deaths / 100 Recovered', 'Confirmed last week',
'1 week change', '1 week % increase', 'WHO Region'],
dtype='object')

Display Descriptive Statistics of the DataFrame

The pandas.DataFrame.describe is used to display the descriptive statistics of the columns of the DataFrame such as mean, count, standard deviation, minimum value, maximum value, etc. Such descriptive statistics help us understand the data well.

# Display descriptive statistics of the DataFrame
df.describe()

Display Data Type, Non-Null Values and Memory Usage about a Pandas DataFrame

The pandas.DataFrame.info method is used to display the index data type and column data type, the number of non-null values, and memory usage.

# Display futher information
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Data columns (total 15 columns):

#    Column                 Non-Null Count    Dtype
---  ------                 --------------    -----
0    Country/Region         187 non-null      object
1    Confirmed              187 non-null      int64
2    Deaths                 187 non-null      int64
3    Recovered              187 non-null      int64
4    Active                 187 non-null      int64
5    New cases              187 non-null      int64
6    New deaths             187 non-null      int64
7    New recovered          187 non-null      int64
8    Deaths / 100 Cases     187 non-null      float64
9    Recovered / 100 Cases  187 non-null      float64
10   Deaths / 100 Recovered 187 non-null      float64
11   Confirmed last week    187 non-null      int64
12   1 week change          187 non-null      int64
13   1 week % increase      187 non-null      float64
14 WHO Region               187 non-null      object
dtypes: float64(4), int64(9), object(2)
memory usage: 22.0+ KB

In this chapter you learned about Pandas methods that can help you understand the data well. Now in the next chapter, you will learn about Pandas methods that will help you to manipulate the data for data preprocessing.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami