In this tutorial, we will see how to compute percent change for values in each column. Pandas’ pct_change() function will compute percent change for each value in a column when compared to the previous element in the column by default. Another way to think is
Computes the percentage change from the immediately previous row
Pandas’ pct_change() function is extremely handy for comparing the percentage of change in a time series data.
First, let us load Pandas library and create some toy time series data.
import pandas as pd
Let us create a dataframe with the top tech companies earnings over the last four years. We have the earnings per company as lists.
year=[2017, 2018, 2019, 2020] facebook =[15934000000, 22112000000, 18485000000, 29146000000] google= [12662000000, 30736000000, 34343000000, 40269000000] microsoft= [25489000000, 16571000000, 39240000000, 44281000000]
We can create a Pandas dataframe from these four lists.
df = pd.DataFrame({"facebook":facebook, "google": google, "microsoft": microsoft}, index=year)
In our toy time series data we have three columns (companies) and the year as index.
df facebook google microsoft 2017 15934000000 12662000000 25489000000 2018 22112000000 30736000000 16571000000 2019 18485000000 34343000000 39240000000 2020 29146000000 40269000000 44281000000
Pandas pct_change() function to compute percent change
We can use pct_change() function and compute percentage of change in revenues for each company comparing to previous year.
df.pct_change()
You can see that the first row is NaN as there is nothing before that. And for the rest of the rows we have percent change in earnings with respect to previous year.
facebook google microsoft 2017 NaN NaN NaN 2018 0.387724 1.427421 -0.349876 2019 -0.164029 0.117354 1.367992 2020 0.576738 0.172553 0.128466
By default, pct_change() function computes using rows.
df.pct_change(axis='rows')
We can also compute percent change with respect of columns using the argument axis=”columns”. In this example, since the percent change with respect to column does not make sense, we transpose the dataframe before using pct_change().
df.T.pct_change(axis="columns") 2017 2018 2019 2020 facebook NaN 0.387724 -0.164029 0.576738 google NaN 1.427421 0.117354 0.172553 microsoft NaN -0.349876 1.367992 0.128466
We can also specify which rows we want to use to compute percent change using “period” argument. For example, to compute the percent change with respect to two years or rows before, we use “period=2” argument. Here we basically compare earnings from year 2019 to 2017 and 2020 to 2020. Because of this we have NaN in the forst two rows. This argument is extremely useful for comparing quarterly earnings/revenue change.
f.pct_change(periods=2) facebook google microsoft 2017 NaN NaN NaN 2018 NaN NaN NaN 2019 0.160098 1.712289 0.539488 2020 0.318108 0.310157 1.672198
Compute Percent change with missing data with pct_change()
Another useful feature of pct_change() function is that it can handle missing data. Let us create a list with missing values.
google= [12662, 30736, None, 40269] year=[2017, 2018, 2019, 2020] facebook =[15934, 22112, 18485, 29146] microsoft= [25489, 16571, 39240, 44281]
We can create a dataframe using the list with missing values.
df = pd.DataFrame({"facebook":facebook, "google": google, "microsoft": microsoft}, index=year)
We can see that Google column has a single missing value.
df facebook google microsoft 2017 15934 12662.0 25489 2018 22112 30736.0 16571 2019 18485 NaN 39240 2020 29146 40269.0 44281
By default, pct_change() deals with missing data and using the method “bfill”, that uses “next valid observation to fill gap”
df.pct_change()
facebook google microsoft 2017 NaN NaN NaN 2018 0.387724 1.427421 -0.349876 2019 -0.164029 0.000000 1.367992 2020 0.576738 0.310157 0.128466
The argument to specify fill method is fill_method.
df.pct_change(fill_method="bfill") facebook google microsoft 2017 NaN NaN NaN 2018 0.387724 1.427421 -0.349876 2019 -0.164029 0.310157 1.367992 2020 0.576738 0.000000 0.128466
Another way to fill the missing values is to use the forward fill using fill_mehtod=”ffill”.
df.pct_change(fill_method="ffill") facebook google microsoft 2017 NaN NaN NaN 2018 0.387724 1.427421 -0.349876 2019 -0.164029 0.000000 1.367992 2020 0.576738 0.310157 0.128466