In this post we will see examples of making scatter plots using Seaborn in Python. We will first make a simple scatter plot and improve it iteratively.
Let us first load the packages we need to make scatter plots in Python.
# import pandas import pandas as pd # import matplotlib import matplotlib.pyplot as plt # import seaborn import seaborn as sns %matplotlib inline
We will use the gapminder data to make scatter plots. Let us load the gapminder data from Software Carpentry github page.
data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe gapminder = pd.read_csv(data_url) print(gapminder.head(3))
We can make scatter plots using Seaborn in multiple ways. Let us use Seaborn’s regplot to make a simple scatter plot using gapminder data frame.
We will be using gdpPercap on x-axis and lifeExp on y-axis. Seaborn’s regplot takes x and y variable and we also feed the data frame as “data” variable. We also specify “fit_reg= False” to disable fitting linear model and plotting a line.
sns.regplot(x="gdpPercap", y="lifeExp", data=gapminder,fit_reg=False)

We can also get the same scatter plot as above, by directly feeding the x and y variables from the gapminder dataframe as shown below.
sns.regplot(x=gapminder["gdpPercap"], y=gapminder["lifeExp"], fit_reg=False)
How to Add Log Scale to Scatter Plot in Python?
Out first attempt at making a scatterplot using Seaborn in Python was successful. However, if you look at the scatter plot most of the points are clumped in a small region of x-axis and the pattern we see is dominated by the outliers.
A better way to make the scatter plot is to change the scale of the x-axis to log scale. To make the x-axis to log scale, we first the make the scatter plot with Seaborn and save it to a variable and then use set function to specify ‘xscale=log’.
splot = sns.regplot(x="gdpPercap", y="lifeExp", data=gapminder, fit_reg=False) splot.set(xscale="log")

We see a linear pattern between lifeExp and gdpPercap. Now, the scatter plot makes more sense. However, a lot of data points overlap on each other. It will be nice to add a bit transparency to the scatter plot.
We can use scatter_kws to adjust the transparency level using a dictionary with key “alpha”.
splot = sns.regplot(x="gdpPercap", y="lifeExp", data=gapminder, scatter_kws={'alpha':0.15}, fit_reg=False) splot.set(xscale="log")