How To Specify Colors to Scatter Plots in Python

Scatter plots are extremely useful to analyze the relationship between two quantitative variables in a data set. Often datasets contain multiple quantitative and categorical variables and may be interested in relationship between two quantitative variables with respect to a third categorical variable.

And coloring scatter plots by the group/categorical variable will greatly enhance the scatter plot. In this post we will see examples of making scatter plots and coloring the data points using Seaborn in Python. We will use the combination of hue and palette to color the data points in scatter plot.

Let us first load packages we need.

import pandas as pd
# import matplotlib
import matplotlib.pyplot as plt
# import seaborn
import seaborn as sns
%matplotlib inline

We will use gapminder data to make scatter plots.

data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
print(gapminder.head(3))

gapminder data set contain data over many years. We will subset the data by filtering rows for two specific years.

gapminder=gapminder[gapminder.year.isin([2002,1962])]

Scatterplot with Seaborn Default Colors

Seaborn has a handy function named scatterplot to make scatter plots in Python. Note that one could also use other functions like regplot.

We provide the Pandas data frame and the variables for x and y argument to scatterplot function. In addition to these arguments we can use hue and specify we want to color the data points based on another grouping variable. This will produce points with different colors.

g =sns.scatterplot(x="gdpPercap", y="lifeExp",
              hue="continent",
              data=gapminder);
g.set(xscale="log");

In our example we also scale the x-axis to log scale to make it easy to see the relationship between the two variables.

Manually specifying colors as list for scatterplot with Seaborn using palette

The above scatter plot made by Seaborn looks great. However, often many times we would like to specify specific colors , not some default colors chosen by Seaborn. To color the data points with specific colors, we can use the argument palette. We can specify the colors we want as a list to the palette argument.

In our example below, we specify the colors we want a list [‘green’,’orange’,’brown’,’dodgerblue’,’red’].

g =sns.scatterplot(x="gdpPercap", y="lifeExp", hue="continent",
              data=gapminder, 
                    palette=['green','orange','brown','dodgerblue','red'], legend='full')
g.set(xscale="log")

Note that now the data points on scatter plot are colored by the colors we specified.

Manually specifying colors as a dictionary for scatterplot with Seaborn using palette

Another option to manually specify colors to scatter plots in Python is to specify color for the variable of interest using a dictionary.

In our example, we specify a color for each continent a Python dictionary.

color_dict = dict({'Africa':'brown',
                  'Asia':'green',
                  'Europe': 'orange',
                  'Oceania': 'red',
                   'Americas': 'dodgerblue'})

We can use the color dictionary for the argument palette and make scatter plots.

g = sns.scatterplot(x="gdpPercap", y="lifeExp", hue="continent",
              data=gapminder, palette=color_dict, 
                   legend='full')
g.set(xscale="log")

And we get the scatterplot colored by the colors specified in the dictionary.

These are not the only options to color the data points with Seaborn. Seaborn offers rich color palettes to color the data points. See https://seaborn.pydata.org/tutorial/color_palettes.html .

Let us choose color palette that is color blind friendly. Seaborn’s colorblind palette gives the option.

g = sns.scatterplot(x="gdpPercap", y="lifeExp", hue="continent",
              data=gapminder, palette='colorblind', 
                   legend='full')
g.set(xscale="log")

Now we have colored the data points by continent using colorblind friendly colors.