Scatter plots are a useful visualization when you have two quantitative variables and want to understand the relationship between them.
In this post we will see examples of making scatter plots using Seaborn in Python. We will first make a simple scatter plot and improve it iteratively.
Let us first load the packages we need to make scatter plots in Python.
# import pandas import pandas as pd # import matplotlib import matplotlib.pyplot as plt # import seaborn import seaborn as sns %matplotlib inline
We will use the gapminder data to make scatter plots. Let us load the gapminder data from Software Carpentry github page.
data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe gapminder = pd.read_csv(data_url) print(gapminder.head(3))
We can make scatter plots using Seaborn in multiple ways. Let us use Seaborn’s regplot to make a simple scatter plot using gapminder data frame.
We will be using gdpPercap on x-axis and lifeExp on y-axis. Seaborn’s regplot takes x and y variable and we also feed the data frame as “data” variable. We also specify “fit_reg= False” to disable fitting linear model and plotting a line.
sns.regplot(x="gdpPercap", y="lifeExp", data=gapminder,fit_reg=False)
We can also get the same scatter plot as above, by directly feeding the x and y variables from the gapminder dataframe as shown below.
sns.regplot(x=gapminder["gdpPercap"], y=gapminder["lifeExp"], fit_reg=False)
How to Add Log Scale to Scatter Plot in Python?
Out first attempt at making a scatterplot using Seaborn in Python was successful. However, if you look at the scatter plot most of the points are clumped in a small region of x-axis and the pattern we see is dominated by the outliers.
A better way to make the scatter plot is to change the scale of the x-axis to log scale. To make the x-axis to log scale, we first the make the scatter plot with Seaborn and save it to a variable and then use set function to specify ‘xscale=log’.
splot = sns.regplot(x="gdpPercap", y="lifeExp", data=gapminder, fit_reg=False) splot.set(xscale="log")
We see a linear pattern between lifeExp and gdpPercap. Now, the scatter plot makes more sense. However, a lot of data points overlap on each other. It will be nice to add a bit transparency to the scatter plot.
We can use scatter_kws to adjust the transparency level using a dictionary with key “alpha”.
splot = sns.regplot(x="gdpPercap", y="lifeExp", data=gapminder, scatter_kws={'alpha':0.15}, fit_reg=False) splot.set(xscale="log")