How To Make Scatter Plot in Python with Seaborn?

Adjusting Transparency in Scatter Plot
Scatter Plot with Transparency

Scatter plots are a useful visualization when you have two quantitative variables and want to understand the relationship between them.

In this post we will see examples of making scatter plots using Seaborn in Python. We will first make a simple scatter plot and improve it iteratively.

Let us first load the packages we need to make scatter plots in Python.

# import pandas
import pandas as pd
# import matplotlib
import matplotlib.pyplot as plt
# import seaborn
import seaborn as sns
%matplotlib inline

We will use the gapminder data to make scatter plots. Let us load the gapminder data from Software Carpentry github page.

data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

We can make scatter plots using Seaborn in multiple ways. Let us use Seaborn’s regplot to make a simple scatter plot using gapminder data frame.

We will be using gdpPercap on x-axis and lifeExp on y-axis. Seaborn’s regplot takes x and y variable and we also feed the data frame as “data” variable. We also specify “fit_reg= False” to disable fitting linear model and plotting a line.

sns.regplot(x="gdpPercap", y="lifeExp",
            data=gapminder,fit_reg=False)
Scatter Plot with Seaborn Python

We can also get the same scatter plot as above, by directly feeding the x and y variables from the gapminder dataframe as shown below.

sns.regplot(x=gapminder["gdpPercap"], y=gapminder["lifeExp"],
            fit_reg=False)

How to Add Log Scale to Scatter Plot in Python?

Out first attempt at making a scatterplot using Seaborn in Python was successful. However, if you look at the scatter plot most of the points are clumped in a small region of x-axis and the pattern we see is dominated by the outliers.

A better way to make the scatter plot is to change the scale of the x-axis to log scale. To make the x-axis to log scale, we first the make the scatter plot with Seaborn and save it to a variable and then use set function to specify ‘xscale=log’.

splot = sns.regplot(x="gdpPercap", y="lifeExp", 
                    data=gapminder, fit_reg=False)
splot.set(xscale="log")
Scatter Plot With Log Scale Seaborn Python

We see a linear pattern between lifeExp and gdpPercap. Now, the scatter plot makes more sense. However, a lot of data points overlap on each other. It will be nice to add a bit transparency to the scatter plot.

We can use scatter_kws to adjust the transparency level using a dictionary with key “alpha”.

splot = sns.regplot(x="gdpPercap", y="lifeExp", 
                    data=gapminder,
                    scatter_kws={'alpha':0.15},
                    fit_reg=False)
splot.set(xscale="log")

Scatter Plot with Transparency