Thanks to theidioms.com

Learn Pandas for Data Science (Course IV)

Learn Pandas for Data Science (Course IV)

Data Structures in Pandas: Series

Pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating-point numbers, Python objects, etc). Simply, a series can be thought of as a column in an excel sheet. In this chapter, you will learn about different methods for creating a Pandas Series, and will also learn how to perform some basic operations on them.

The syntax for creating a simple series using pandas is:

series = pandas.Series(data, index=index)

Here, data can be of different types such as a Python dictionary, n-dimensional array, scalar value (like 10), etc. Generally, in real case usage, the data is loaded from external data sources such as a database, CSV file, or an Excel file. We can also specify the index of data as a python list which must be of the same length as that of the data.

How to Create Pandas Series?

A Pandas series can be created by several ways, some of which are explained below:

Creating a Pandas Series from a Python List

A python list can be passed into the pandas.Series() method to construct a Series as shown in the following example:

# Making necessary imports
import pandas as pd
import numpy as np

# Defining data as a list
data = [1, 8, 5, np.nan, 6, 7] # Note: NaN (Not a Number) is the standard missing data marker used in pandas.

# Creating a pandas series from the list
s = pd.Series(data,name="defined series") 

print(s) #printing the series
print(s.index) #printing the index

OUTPUT:

0    1.0
1    8.0
2    5.0
3    NaN
4    6.0
5    7.0

Name: defined series, dtype: float64
RangeIndex(start=0, stop=6, step=1)
Creating a Pandas Series from a Python Dictionary

We can pass a python dictionary to the pandas.Series() method to create a new series. The keys of the python dictionary used will be the indexes of the newly created Pandas series. Run the following block of code to construct a Pandas series from a Dictionary and print the value of the created Series.

# Making necessary imports
import pandas as pd

# Defining data as a python dictionary
data = {'a' : 1, 'b' : 2, 'c' : 3}

# Creating a pandas series from the dictionary
s = pd.Series(data)

print(s) #printing the series
print(s.index) #printing the index

# Get value of series like dictionary
print("Value of b is:",s['b']) #prints out the value in index b

OUTPUT:

a    1
b    2
c    3
dtype: int64

Index(['a', 'b', 'c'], dtype='object')
Value of b is: 2

Basic operations on Pandas Series

Pandas Series can be thought of as vectors in linear algebra. Various arithmetic operations such as addition, subtraction, multiplication, etc can be performed on Pandas Series. Also, slicing operation (similar to that of list slicing) is supported in Pandas Series.

# Making necessary imports
import pandas as pd
import numpy as np

# Creating Pandas series
s = pd.Series([1, 8, 5, 25, 6, 7], name="new series")
print("s: \n", s)

# Arithmetic operations in series
print("\ns+s: \n", s+s)          # vector addition
print("\ns-s: \n", s-s)          # vector substraction 
print("\ns*2: \n", s*2)          # multiplying vector by a scalar
print("\ns/2: \n", s/2)          # dividing vector by a scalar
print("\ne^s: \n", np.exp(s))    # finds exponent
print("\nlog(s): \n", np.log(s)) # finds log

# Slicing operations in series
# Similar to python list slicing, i.e., seriesName[start:end:step]
print("\ns[:2]: \n", s[:2])       # prints first two rows 
print("\ns[1:4:1]: \n", s[1:4:1]) # prints the rows from 8 to 6

OUTPUT:

s: 
0     1
1     8
2     5
3    25
4     6
5     7
Name: new series, dtype: int64

s+s:
0     2
1    16
2    10
3    50
4    12
5    14
Name: new series, dtype: int64

s-s:
0    0
1    0
2    0
3    0
4    0
5    0
Name: new series, dtype: int64

s*2:
0     2
1    16
2    10
3    50
4    12
5    14
Name: new series, dtype: int64

s/2:
0     0.5
1     4.0
2     2.5
3    12.5
4     3.0
5     3.5
Name: new series, dtype: float64

e^s:
0    2.718282e+00
1    2.980958e+03
2    1.484132e+02
3    7.200490e+10
4    4.034288e+02
5    1.096633e+03
Name: new series, dtype: float64

log(s):
0    0.000000
1    2.079442
2    1.609438
3    3.218876
4    1.791759
5    1.945910
Name: new series, dtype: float64

s[:2]:
0    1
1    8
Name: new series, dtype: int64

s[1:4:1]:
1     8
2     5
3    25
Name: new series, dtype: int64

Now you know how to create a Pandas series and perform various operations on it. The next chapter will introduce you to another most commonly used data structure in Pandas: the Pandas DataFrame.

Leave your thought here

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami