Python Pandas library is well known for its amazing data munging capabilities. However, a little underused feature of Pandas is its plotting capabilities. Yes, one can make better visualizations with Matplotlib or Seaborn or Altair. However, Pandas plotting capabilities can be extremely handy when you are in exploratory data analysis mode and want to quickly make data visualizations on the fly.
In this post, we will see 13 tips with complete code and data to make the most of Pandas plotting for the commonly used data visualization plots. We will mostly use Pandas’ plot() function and make quick exploratory visualizations including line plots, boxplots, barplots, and density plots.
Let us load Pandas and matplotlib to make plots with Pandas.
# import matplotlib import pandas as pd # import numpy import numpy as np # import matplotlib import matplotlib.pyplot as plt
We will use gapminder data in this post.
data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe gapminder = pd.read_csv(data_url)
print(gapminder.head(3)) country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
One of the good things about plotting with Pandas is that Pandas plot() function can handle multiple types of common plots. For most of our examples, we will mainly use Pandas plot() function.
1. Line Plots with Pandas
We can make line plots with Pandas using plot.line() accessor. We can directly chain plot() to the dataframe as df.plot.line(). We need to specify the variables from the dataframe on x and y-axis.
When plotting with Pandas we can specify the plot size using figsize argument inside the plot.line().
In this example, we specify the size with (8,6) as tuple. We also save the plot using matplotlib.pyplot’s savefig() function.
df_uk = gapminder.query('country=="United Kingdom"') df_uk.plot.line(x='lifeExp', y='gdpPercap', figsize=(8,6)) plt.savefig("Line_Plot_with_Pandas_Python.jpg")
2. Histogram with Pandas
We can make histogram using Pandas plot() function using hist() function on the Series containing the variable. In this example, we are making histogram of lifeExp variable from gapminder dataframe. One of the key arguments to histogram function is specifying the number of bins. In this example, we specify the number of bins to be 100 with bins=100 argument.
gapminder['lifeExp'].plot.hist(bins=100, figsize=(8,6))
We can also make multiple overlapping histograms with Pandas’ plot.hist() function. However, Pandas plot() function expects the dataframe to be in wide form with each group that we want separate histogram in a separate column.
We can reshape our dataframe from long form to wide form using pivot function as shown below.
df2_wide=df2.pivot(columns='continent', values='lifeExp') df2_wide.head(n=3) continent Africa Americas Asia Europe Oceania 0 NaN NaN 28.801 NaN NaN 1 NaN NaN 30.332 NaN NaN 2 NaN NaN 31.997 NaN NaN
Now each group of the histogram is a separate variable in the dataframe and we can use plot.hist() to make overlapping histograms.
df2_wide.plot.hist(bins=100, figsize=(8,6), alpha=0.7) plt.savefig("multiple_overlapping_histograms_with_Pandas_Python.jpg")
Pandas nicely colors each group in different color. In this example, we have adjusted the transparency of the colors to 30% with alpha parameter.
3. Scatter Plot with Pandas
We can make scatter plots between two numerical variables using Pandas plot.scatter() function. Here we make a scatter plot between lifeExp and gdpPercap using Pandas plot.scatter() function.
gapminder.plot.scatter(x='lifeExp', y='gdpPercap', ylim=(100,200000), logy=True, figsize=(8,6), alpha=0.3)
Here we also customize the scatter plot by specifying y-axis limits, transforming y-axis to log-scale and with transparency alpha=0.3.
4. Hexbin Plot with Pandas
Another variant of scatter plot is hexbin plot. Pandas’ plot() function can make hexbin plot with hexbin() function.
gapminder['log2_gdpPercap']= np.log2(gapminder['gdpPercap']) gapminder.plot.hexbin(x='lifeExp', y='log2_gdpPercap', gridsize=20,figsize=(8,6))
In this example, we transform the y-axis variable to log-scale before using it in the hexbin() function to make the bexbin plot.
5. Boxplots with Pandas
We can make boxplots with Python in two ways. In this example we will use Pandas’ plot() function to make simple boxplots.
The box() function available through Pandas’ plot(), can make boxplots with data in wide form.
df3 = gapminder[['continent','lifeExp']] df3.head() continent lifeExp 0 Asia 28.801 1 Asia 30.332 2 Asia 31.997 3 Asia 34.020 4 Asia 36.088
So, we first use pivot function on dataframe with long form to reshape into a data frame in wide form as before.
df3_wide = df2.pivot(columns='continent', values='lifeExp') df3_wide.head() continent Africa Americas Asia Europe Oceania 0 NaN NaN 28.801 NaN NaN 1 NaN NaN 30.332 NaN NaN 2 NaN NaN 31.997 NaN NaN 3 NaN NaN 34.020 NaN NaN 4 NaN NaN 36.088 NaN NaN
Then, we can use plot.box() function to make simple boxplot.
df3_wide.plot.box(figsize=(8,6))
We get a simple boxplot with lifeExp distribution across each continent.
Another way to make boxplot from Pandas is to use the boxplot() function available in Pandas. Pandas boxplot() function can take the data in long/tidy form. We need to specify which variable we need to group the data and make boxplot.
gapminder.boxplot(column='lifeExp',by='continent', figsize=(8,6), fontsize=14)
In this example, we specify the the variable we want to plot with column argument and the variable we want to group and make boxplot using “by” argument.
Pandas boxplot() makes a basic boxplot just like Pandas plot.box() function we saw before.
6. Barplots with Pandas
We can make Barcharts or barplots using Pandas’ plot.bar() function. Let us first create a dataframe with counts of each variable for each continent from gapminder data.
gapminder = pd.read_csv(data_url) gapminder_count=gapminder.groupby('continent').count() gapminder_count country year pop lifeExp gdpPercap continent Africa 624 624 624 624 624 Americas 300 300 300 300 300 Asia 396 396 396 396 396
We can make barplot with counts of number of countries per continent using country variable using plot.bar().
gapminder_count['country'].plot.bar(figsize=(8,6), fontsize=12, rot=0)
By default Pandas barplot function plot.bar() places the x-axis tick labels vertically. In this example, we have use rot=0 to make it easy to read the labels. And also changed the font size of the text on the barplot with fontsize=12.
7. Horizontal Barplots with Pandas
We can also make horizontal barplots easily with Pandas using plot.barh() function as shown below.
gapminder_count['country'].plot.barh(figsize=(8,6), fontsize=12, rot=0)
8. Stacked Barplots with Pandas
We can make stacked barplots using plot.bar() function in Pandas. By default, plot.bar() function has stacked=False set. And changing the argument stacked=True inside plot.bar() function will make stacked barplot.
gapminder_count.plot.bar(stacked=True, figsize=(8,6),rot=0)
With stacked=True, we get vertically stacked barchart.
9. Simple Density Plots with Pandas
We can make simple density plots using Pandas with plot.density() function. We need to chain the variable that we want to make density plot as Pandas Series to plot.density() function.
gapminder.lifeExp.plot.density(figsize=(8,6),linewidth=4)
In this example, we have changed the default line width of the density plot to 4 with linewidth=4.
10. Multiple Density Plots with Pandas
To make multiple density plot we need the data in wide form with each group of data as a variable in the wide data frame. We have already created wide data frame using Pandas’ pivot() function.
df3_wide.head() continent Africa Americas Asia Europe Oceania 0 NaN NaN 28.801 NaN NaN 1 NaN NaN 30.332 NaN NaN 2 NaN NaN 31.997 NaN NaN
We can call plot.density() function on the wide dataframe and make multiple density plots with Pandas.
df3_wide.plot.density(figsize=(8,6),linewidth=4)
11. Multiple Density Plots using kde() function with Pandas
Pandas plot.kde() function can also make density plot. Here is an example of using plot.kde() function to make multiple density plots.
df3_wide.plot.kde(figsize=(8,6),linewidth=4)
We get the same density plot as with plot.density() function.
To summarize, through multiple examples of making a variety of statistical data visualizations that are commonly used, we saw the power of Pandas to make such visualizations quickly. It may be difficult to customize some of the plots, but Pandas uses matplotlib under the hood, so it is possible to tweak with the knowledge of matplotlib. Happy exploring and plotting with Pandas.