Boxplot, introduced by John Tukey in his classic book Exploratory Data Analysis close to 50 years ago, is great for visualizing data distributions from multiple groups. Boxplot captures the summary of the data efficiently with a simple box and whiskers and allows us to compare easily across groups. Boxplots summarizes a sample data using 25th, […]
Python
Python’s Matplotlib Version 2.2 is here
Matplotlib, the python’s core plotting library, Matplotlib Version 2.2 is available now. The new Matplotlib Version 2.2 has a lot of new things to try including A new method to automatically decide spacing between subplots. In the current version, one typically uses tight_layout method to tighten the spaces around plot objects The new method called […]
How to Filter a Pandas Dataframe Based on Null Values of a Column?
Real datasets are messy and often they contain missing data. Python’s pandas can easily handle missing data or NA values in a dataframe. One of the common tasks of dealing with missing data is to filter out the part with missing values in a few ways. One might want to filter the pandas dataframe based […]
Introduction to Sparse Matrices in Python with SciPy
What is a Sparse Matrix? Imagine you have a two-dimensional data set with 10 rows and 10 columns such that each element contains a value. We can also call such data as matrix, in this example it is a dense 10 x 10 matrix. Now imagine, you have a 10 x 10 matrix with only […]
Probability Distributions in Python with SciPy and Seaborn
If you are a beginner in learning data science, understanding probability distributions will be extremely useful. One of the best ways to understand probability distributions is simulate random numbers or generate random variables from specific probability distribution and visualizing them. 9 Most Commonly Used Probability Distributions There are at least two ways to draw samples […]