displot() for univariate and bivariate distributions
One of the big new changes is “Modernization of distribution functions” in Seaborn version 0.11. The new version of Seaborn has three new functions displot(), histplot() and ecdfplot() to make visualizing distributions easier. Yes, we don’t have to write your own function to make ECDF plot any more.
Seaborn’s displot() can be used for visualizing both univariate and bivariate distributions. Among these three new function, displot function gives a figure level interface to the common distribution plots in seaborn including histograms (histplot), density plots, empirical distributions (ecdfplot), and rug plots. For example, we can use displot() and create
We can also add rugplot() to show the actual values of the data to any of these plots.
Don’t get confused with distplot() for displot(). displot() is the new distplot() with better capabilities and distplot() is deprecated starting from this Seaborn version.
With the new displot() function in Seaborn, the plotting function hierarchy kind of of looks like this now covering most of the plotting capabilities.
In addition to catplot() for categorical variables and relplot() for relational plots, we now have displot() covering distributional plots.
Let us get started trying out some of the functionalities. We can install the latest version of Seaborn
1 | pip install seaborn |
Let us load seaborn and make sure we have Seaborn version 0.11.
1 2 3 | import seaborn as sns print (sns.__version__) 0.11 . 0 |
We will use palmer penguin data set to illustrate some of the new functions and features of seaborn. Penguins data is readily available as part of seaborn and we can load using load_dataset() function.
1 | penguins = sns.load_dataset( "penguins" ) |
1 2 3 4 5 6 7 | penguins.head() species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex 0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male 1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female 2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female 3 Adelie Torgersen NaN NaN NaN NaN NaN 4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female |
We can create histograms with Seaborn’s histplot() function, KDE plot with kdeplot() function, and ECDF plot with ecdfplot(). However, we primarily use displot() to illustrate Seaborn’s new capabilities.
Histograms with Seaborn displot()
Let us make a simple histogram with Seaborn’s displot() function.
1 2 3 4 5 6 | plt.figure(figsize = ( 10 , 8 )) sns.displot(penguins, x = "body_mass_g" , bins = 25 ) plt.savefig( "Seaborn_histogram_with_displot.png" , format = 'png' ,dpi = 150 ) |
Here we have also specified the number of bins in the histogram.
We can also color the histogram by a variable and create overlapping histograms.
1 2 3 4 5 6 7 | plt.figure(figsize = ( 10 , 8 )) sns.displot(penguins, x = "body_mass_g" , hue = "species" , bins = 25 ) plt.savefig( "Seaborn_overlapping_histogram_hue_with_displot.png" , format = 'png' ,dpi = 150 ) |
In this example, we color penguins’ body mass by species.
Facetting with Seaborn displot()
With “col” argument we can create “small multiples” or faceting to create multiple plots of the same type using subsets of data based on a variable’s value.
1 2 3 4 5 6 7 | plt.figure(figsize = ( 10 , 8 )) sns.displot(penguins, x = "body_mass_g" , col = "species" , bins = 25 ) plt.savefig( "Seaborn_facetting_histogram_col_with_displot.png" , format = 'png' ,dpi = 150 ) |
Here, we have facetted by values of penguins’ species in our data set.
Density plot with Seaborn’s displot()
Let us use displot() and create density plot using kind=”kde” argument. Here we also color by species variable using “hue” argument.
1 2 3 4 5 6 7 | plt.figure(figsize = ( 10 , 8 )) sns.displot(penguins, x = "body_mass_g" , hue = "species" , kind = "kde" ) plt.savefig( "Seaborn_kernel_density_plot_with_displot.png" , format = 'png' ,dpi = 150 ) |
Check out the Seaborn documentation, the new version has a new ways to make density plots now.
ECDF Plot with Seaborn’s displot()
One of the personal highlights of Seaborn update is the availability of a function to make ECDF plot. ECDF aka Empirical Cumulative Distribution is a great alternate to visualize distributions.
In an ECDF plot, x-axis correspond to the range of data values for variables and on the y-axis we plot the proportion of data points (or counts) that are less than are equal to corresponding x-axis value.
Unlike histograms and density plot, ECDF plot enables to visualize the data directly without any smoothing parameters like number of bins. Its use possibly visible when you have multiple distributions to visualize.
A potential disadvantage is that
the relationship between the appearance of the plot and the basic properties of the distribution (such as its central tendency, variance, and the presence of any bimodality) may not be as intuitive.
Let us make ecdf plot using displot() using kind=”ecdf”. Here we make ecdf plot of a variable and color it based on values of another variable.
1 2 3 4 5 6 7 | plt.figure(figsize = ( 10 , 8 )) sns.displot(penguins, x = "body_mass_g" , hue = "species" , kind = "ecdf" ) plt.savefig( "Seaborn_ecdf_plot_with_displot.png" , format = 'png' ,dpi = 150 ) |
Bivariate KDE plot and Histogram with displot()
With kdeplot(), we can also make bivariate density plot. In this example, we use displot() with “kind=’kde'” to make bivariate density/ contour plot.
1 2 3 4 5 6 7 8 | plt.figure(figsize = ( 10 , 8 )) sns.displot(data = penguins, x = "body_mass_g" , y = "bill_depth_mm" , kind = "kde" , hue = "species" ) plt.savefig( "Seaborn_displot_bivariate_kde_contour.png" , format = 'png' ,dpi = 150 ) |
We can also make bivariate histogram with displot() using kind=”hist” option or histplot() to make density plot.
1 2 3 4 5 6 7 8 | plt.figure(figsize = ( 10 , 8 )) sns.displot(data = penguins, x = "body_mass_g" , y = "bill_depth_mm" , kind = "hist" , hue = "species" ) plt.savefig( "Seaborn_displot_bivariate_hist.png" , format = 'png' ,dpi = 150 ) |
New features to Seaborn jointplot()
With Seaborn 0.11, jointplot also has gained some nice features. Now jointplot() can take “hue” as argument to color data points by a variable.
1 2 3 4 | sns.jointplot(data = penguins, x = "body_mass_g" , y = "bill_depth_mm" , hue = "species" ) |
And jointplot() also gets a way to plot bivariate histogram on the joint axes and univariate histograms on the marginal axes using kind=”hist” argument to jointplot().
1 2 3 4 5 | sns.jointplot(data = penguins, x = "body_mass_g" , y = "bill_depth_mm" , hue = "species" , kind = "hist" ) |
Another big change that will help writing better code to make data visualization is that most Seaborn plotting functions, will now require their parameters to be specified using keyword arguments. Otherwise, you will see FutureWarning in v0.11.
As part of the update, Seaborn has also got spruced up documentation for Seaborn’s capabilities. Check out the new documentation on data structure that is accepted by Seaborn plotting functions. Some of the functions can take the data in both wide and long forms of data. Currently, the distribution and relational plotting functions can handle both and in future releases other Seaborn functions also will get the same data inputs.
[…] version 0.11.0 is filled with tonnes of new features and check out a couple of blog posts to learn about what is new and the new Seaborn functions to make better Data […]