Principal Component Analysis (PCA) is one of the most useful techniques in Exploratory Data Analysis to understand the data, reduce dimensions of data and for unsupervised learning in general. Let us quickly see a simple example of doing PCA analysis in Python. Here we will use scikit-learn to do PCA on a simulated data. Let […]
How To Plot Ridgeline Plots in R?
Ridgeline plots is a great way to visualize changes in multiple distributions/histogram either over time or space. It was initially called as joyplots, for a brief time. ggridges package from UT Austin professor Claus Wilke lets you make ridgeline plots in combinaton with ggplot. Here is how Claus describes the ridgeline plot with a brief […]
How to Make Boxplots in Python with Pandas and Seaborn?
Boxplot, introduced by John Tukey in his classic book Exploratory Data Analysis close to 50 years ago, is great for visualizing data distributions from multiple groups. Boxplot captures the summary of the data efficiently with a simple box and whiskers and allows us to compare easily across groups. Boxplots summarizes a sample data using 25th, […]
How To Generate Random Numbers from Probability Distributions in R?
Understanding probability distributions and how one can simulate random numbers from a specific probability distribution is very useful in understanding probability and use them effectively in doing data science. Here we will be looking at how to simulate/generate random numbers from 9 most commonly used probability distributions in R and visualizing the 9 probability distributions […]
Python’s Matplotlib Version 2.2 is here
Matplotlib, the python’s core plotting library, Matplotlib Version 2.2 is available now. The new Matplotlib Version 2.2 has a lot of new things to try including A new method to automatically decide spacing between subplots. In the current version, one typically uses tight_layout method to tighten the spaces around plot objects The new method called […]




