11 Tips to Make Plots with Pandas

Overlapping Histograms with Pandas
Overlapping Histograms with Pandas

Pandas Plotting Tips
Python Pandas library is well known for its amazing data munging capabilities. However, a little underused feature of Pandas is its plotting capabilities. Yes, one can make better visualizations with Matplotlib or Seaborn or Altair. However, Pandas plotting capabilities can be extremely handy when you are in exploratory data analysis mode and want to quickly make data visualizations on the fly.

In this post, we will see 13 tips with complete code and data to make the most of Pandas plotting for the commonly used data visualization plots. We will mostly use Pandas’ plot() function and make quick exploratory visualizations including line plots, boxplots, barplots, and density plots.

Let us load Pandas and matplotlib to make plots with Pandas.

# import matplotlib
import pandas as pd
# import numpy
import numpy as np
# import matplotlib
import matplotlib.pyplot as plt

We will use gapminder data in this post.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))
       country  year         pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

One of the good things about plotting with Pandas is that Pandas plot() function can handle multiple types of common plots. For most of our examples, we will mainly use Pandas plot() function.

1. Line Plots with Pandas

We can make line plots with Pandas using plot.line() accessor. We can directly chain plot() to the dataframe as df.plot.line(). We need to specify the variables from the dataframe on x and y-axis.

When plotting with Pandas we can specify the plot size using figsize argument inside the plot.line().
In this example, we specify the size with (8,6) as tuple. We also save the plot using matplotlib.pyplot’s savefig() function.

df_uk = gapminder.query('country=="United Kingdom"')
df_uk.plot.line(x='lifeExp', y='gdpPercap', figsize=(8,6))
plt.savefig("Line_Plot_with_Pandas_Python.jpg")
Line Plot with Pandas

2. Histogram with Pandas

We can make histogram using Pandas plot() function using hist() function on the Series containing the variable. In this example, we are making histogram of lifeExp variable from gapminder dataframe. One of the key arguments to histogram function is specifying the number of bins. In this example, we specify the number of bins to be 100 with bins=100 argument.

gapminder['lifeExp'].plot.hist(bins=100, figsize=(8,6))
Histograms with Pandas

We can also make multiple overlapping histograms with Pandas’ plot.hist() function. However, Pandas plot() function expects the dataframe to be in wide form with each group that we want separate histogram in a separate column.

We can reshape our dataframe from long form to wide form using pivot function as shown below.

df2_wide=df2.pivot(columns='continent', values='lifeExp')
df2_wide.head(n=3)
continent	Africa	Americas	Asia	Europe	Oceania
0	NaN	NaN	28.801	NaN	NaN
1	NaN	NaN	30.332	NaN	NaN
2	NaN	NaN	31.997	NaN	NaN

Now each group of the histogram is a separate variable in the dataframe and we can use plot.hist() to make overlapping histograms.

df2_wide.plot.hist(bins=100, figsize=(8,6), alpha=0.7)
plt.savefig("multiple_overlapping_histograms_with_Pandas_Python.jpg")

Pandas nicely colors each group in different color. In this example, we have adjusted the transparency of the colors to 30% with alpha parameter.

Overlapping Histograms with Pandas

3. Scatter Plot with Pandas

We can make scatter plots between two numerical variables using Pandas plot.scatter() function. Here we make a scatter plot between lifeExp and gdpPercap using Pandas plot.scatter() function.

gapminder.plot.scatter(x='lifeExp', y='gdpPercap',
                       ylim=(100,200000),
                       logy=True, 
                       figsize=(8,6),
                       alpha=0.3)

Here we also customize the scatter plot by specifying y-axis limits, transforming y-axis to log-scale and with transparency alpha=0.3.

Scatter Plot with Pandas

4. Hexbin Plot with Pandas

Another variant of scatter plot is hexbin plot. Pandas’ plot() function can make hexbin plot with hexbin() function.

gapminder['log2_gdpPercap']= np.log2(gapminder['gdpPercap'])
gapminder.plot.hexbin(x='lifeExp', y='log2_gdpPercap', gridsize=20,figsize=(8,6))

In this example, we transform the y-axis variable to log-scale before using it in the hexbin() function to make the bexbin plot.

Hexbin Plot with Pandas

5. Boxplots with Pandas

We can make boxplots with Python in two ways. In this example we will use Pandas’ plot() function to make simple boxplots.

The box() function available through Pandas’ plot(), can make boxplots with data in wide form.

df3 = gapminder[['continent','lifeExp']]
df3.head()

continent	lifeExp
0	Asia	28.801
1	Asia	30.332
2	Asia	31.997
3	Asia	34.020
4	Asia	36.088

So, we first use pivot function on dataframe with long form to reshape into a data frame in wide form as before.

df3_wide = df2.pivot(columns='continent', values='lifeExp')
df3_wide.head()
continent	Africa	Americas	Asia	Europe	Oceania
0	NaN	NaN	28.801	NaN	NaN
1	NaN	NaN	30.332	NaN	NaN
2	NaN	NaN	31.997	NaN	NaN
3	NaN	NaN	34.020	NaN	NaN
4	NaN	NaN	36.088	NaN	NaN

Then, we can use plot.box() function to make simple boxplot.

df3_wide.plot.box(figsize=(8,6))

We get a simple boxplot with lifeExp distribution across each continent.

Simple Boxplot with Pandas

Another way to make boxplot from Pandas is to use the boxplot() function available in Pandas. Pandas boxplot() function can take the data in long/tidy form. We need to specify which variable we need to group the data and make boxplot.

gapminder.boxplot(column='lifeExp',by='continent',
                  figsize=(8,6),
                  fontsize=14)

In this example, we specify the the variable we want to plot with column argument and the variable we want to group and make boxplot using “by” argument.

Pandas boxplot() makes a basic boxplot just like Pandas plot.box() function we saw before.

Boxplot with boxplot() function in Pandas

6. Barplots with Pandas

We can make Barcharts or barplots using Pandas’ plot.bar() function. Let us first create a dataframe with counts of each variable for each continent from gapminder data.

gapminder = pd.read_csv(data_url)
gapminder_count=gapminder.groupby('continent').count()
gapminder_count
           country  year  pop  lifeExp  gdpPercap
continent                                        
Africa         624   624  624      624        624
Americas       300   300  300      300        300
Asia           396   396  396      396        396

We can make barplot with counts of number of countries per continent using country variable using plot.bar().

gapminder_count['country'].plot.bar(figsize=(8,6), fontsize=12, rot=0)

By default Pandas barplot function plot.bar() places the x-axis tick labels vertically. In this example, we have use rot=0 to make it easy to read the labels. And also changed the font size of the text on the barplot with fontsize=12.

Barplot with Pandas

7. Horizontal Barplots with Pandas

We can also make horizontal barplots easily with Pandas using plot.barh() function as shown below.

gapminder_count['country'].plot.barh(figsize=(8,6), fontsize=12, rot=0)
Horizontal Barplots with Pandas

8. Stacked Barplots with Pandas

We can make stacked barplots using plot.bar() function in Pandas. By default, plot.bar() function has stacked=False set. And changing the argument stacked=True inside plot.bar() function will make stacked barplot.

gapminder_count.plot.bar(stacked=True, figsize=(8,6),rot=0)

With stacked=True, we get vertically stacked barchart.

Stacked Barplot with Pandas

9. Simple Density Plots with Pandas

We can make simple density plots using Pandas with plot.density() function. We need to chain the variable that we want to make density plot as Pandas Series to plot.density() function.

gapminder.lifeExp.plot.density(figsize=(8,6),linewidth=4)

In this example, we have changed the default line width of the density plot to 4 with linewidth=4.

Simple density plot with Pandas Python

10. Multiple Density Plots with Pandas

To make multiple density plot we need the data in wide form with each group of data as a variable in the wide data frame. We have already created wide data frame using Pandas’ pivot() function.

df3_wide.head()
continent	Africa	Americas	Asia	Europe	Oceania
0	NaN	NaN	28.801	NaN	NaN
1	NaN	NaN	30.332	NaN	NaN
2	NaN	NaN	31.997	NaN	NaN

We can call plot.density() function on the wide dataframe and make multiple density plots with Pandas.

df3_wide.plot.density(figsize=(8,6),linewidth=4)
Multiple Density Plots with Pandas Python

11. Multiple Density Plots using kde() function with Pandas

Pandas plot.kde() function can also make density plot. Here is an example of using plot.kde() function to make multiple density plots.

df3_wide.plot.kde(figsize=(8,6),linewidth=4)

We get the same density plot as with plot.density() function.

Multiple Density Plots using kde()

To summarize, through multiple examples of making a variety of statistical data visualizations that are commonly used, we saw the power of Pandas to make such visualizations quickly. It may be difficult to customize some of the plots, but Pandas uses matplotlib under the hood, so it is possible to tweak with the knowledge of matplotlib. Happy exploring and plotting with Pandas.