3 Examples Using Pivot Table in Pandas

One of the most common tasks in data science is to manipulate the data frame we have to a specific format. For example, sometime we may want to take data frame with fewer columns, say in long format, summarize and convert into a data frame with multiple columns, i.e. a wide data frame.

Pandas Pivot Example
Pandas Pivot Example

Python Pandas function pivot_table help us with the summarization and conversion of dataframe in long form to dataframe in wide form, in a variety of complex scenarios.

In Pandas, the pivot table function takes simple data frame as input, and performs grouped operations that provides a multidimensional summary of the data. Yes, in a way, it is related Pandas group_by function. Jake Vanderplas nicely explains pivot_table in his Python Data Science Handbook as

essentially a multidimensional version of GroupBy aggregation. That is, you split-apply-combine, but both the split and the combine happen across not a one-dimensional index, but across a two-dimensional grid.

Here are 3 examples of using pivot in Pandas with pivot_Table. We will use Pandas’ pivot_table function to summarize and convert our two/three column dataframe to multiple column dataframe.

Let us firs load Python pandas.

import pandas as pd

Let us use the gapminder data first create a data frame with just two columns.

>data_url = 'http://bit.ly/2cLzoxH'
>gapminder = pd.read_csv(data_url)
>print(gapminder.head(3))
       country  year         pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

1. Pandas pivot Simple Example

Let us see a simple example of Python Pivot using a dataframe with jus two columns. Let us subset our dataframe to contain just two columns, continent and lifeExp

# select two columns from gapminder dataframe
>df = gapminder[['continent','lifeExp']]
>print(df.shape)
(1704, 2)
Pandas Pivot Example

We can see that df is a data frame in long format with two columns. As a simple example, we can use Pandas pivot_table to convert the tall table to a wide table, computing the mean lifeExp across continents. To do that, we will use pd.pivot_table with the data frame as one of the arguments and specify which variable we would like use for columns and which variable we would like to summarize. One of the arguments of pivot_table, agg_func has mean as default.

# simple example with pivot_table
>pd.pivot_table(df, values='lifeExp', 
                    columns='continent')

continent	Africa	Americas	Asia	Europe	Oceania
lifeExp	48.86533	64.658737	60.064903	71.903686	74.326208

Let us see another simple example of pivot_table. In the above example we used pvot_table to compute mean lifeExp for each continent. We can compute mean lifeExp for each country, by simply specifying “country” for columns argument.

# simple example with pivot_table
>df = gapminder[['country','lifeExp']]
# print dataframe shape
>print(df.shape)
# pivot_table example
>pd.pivot_table(df, values='lifeExp',
                    columns='country')

And we will get mean lifeExp for each country in wide format.

2. Pandas pivot_table on a data frame with three columns

Pandas pivot_table gets more useful when we try to summarize and convert a tall data frame with more than two variables into a wide data frame.

Let us say we have dataframe with three columns/variables and we want to convert this into a wide data frame have one of the variables summarized for each value of the other two variables.

Let us use three columns; continent, year, and lifeExp, from gapminder data and use pivot_table to compute mean lifeExp for each continent and year. Let us use ‘continent’ on columns and year as ‘index’

# select three columns from gapminder dataframe
>df1 = gapminder[['continent', 'year','lifeExp']]
# pivot table example with three columns
>pd.pivot_table(df1, values='lifeExp', 
                     index=['year'], 
                     columns='continent')

continent     Africa  Americas       Asia     Europe  Oceania
year                                                         
1952       39.135500  53.27984  46.314394  64.408500   69.255
1957       41.266346  55.96028  49.318544  66.703067   70.295
1962       43.319442  58.39876  51.563223  68.539233   71.085
1967       45.334538  60.41092  54.663640  69.737600   71.310
1972       47.450942  62.39492  57.319269  70.775033   71.910

3. Pandas pivot_table with Different Aggregating Function

As mentioned before, pivot_table uses mean function for aggregating or summarizing data by default.  We can change the aggregating function,  if needed. For example, we can use aggfunc=’min’ to compute “minimum” lifeExp instead of “mean” lifeExp for each year and continent values.

# pivot table example with aggfunc='min'
>pd.pivot_table(df1, values='lifeExp', 
                    index=['year'], 
                    columns='continent',
                    aggfunc='min')

continent  Africa  Americas    Asia  Europe  Oceania
year                                                
1952       30.000    37.579  28.801  43.585    69.12
1957       31.570    40.696  30.332  48.079    70.26
1962       32.767    43.428  31.997  52.098    70.93
1967       34.113    45.032  34.020  54.336    71.10
1972       35.400    46.714  36.088  57.005    71.89

Not only we can specify what aggregating function we want, we can also specify more than one aggregating functions. For example, if we are interested in bot minimum and maximum values of lifeExp for each year and continent, we can specify the functions as a list to the argument ‘aggfunc’

# pivot table example with multiple aggregating functions
>pd.pivot_table(df1, values='lifeExp',
                     index=['year'],
                     columns='continent',
                     aggfunc=[min,max])
	                             min	max
continent	Africa	Americas	Asia	Europe	Oceania	Africa	Americas	Asia	Europe	Oceania
year										
1952	30.000	37.579	28.801	43.585	69.12	52.724	68.75	65.39	72.67	69.39
1957	31.570	40.696	30.332	48.079	70.26	58.089	69.96	67.84	73.47	70.33
1962	32.767	43.428	31.997	52.098	70.93	60.246	71.30	69.39	73.68	71.24

When we give multiple aggregating functions, we would get a multi-indexed data frame as output.