Boxplot, introduced by John Tukey in his classic book Exploratory Data Analysis close to 50 years ago, is great for visualizing data distributions from multiple groups. Boxplot captures the summary of the data efficiently with a simple box and whiskers and allows us to compare easily across groups. Boxplots summarizes a sample data using 25th, […]
How To Generate Random Numbers from Probability Distributions in R?
Understanding probability distributions and how one can simulate random numbers from a specific probability distribution is very useful in understanding probability and use them effectively in doing data science. Here we will be looking at how to simulate/generate random numbers from 9 most commonly used probability distributions in R and visualizing the 9 probability distributions […]
Python’s Matplotlib Version 2.2 is here
Matplotlib, the python’s core plotting library, Matplotlib Version 2.2 is available now. The new Matplotlib Version 2.2 has a lot of new things to try including A new method to automatically decide spacing between subplots. In the current version, one typically uses tight_layout method to tighten the spaces around plot objects The new method called […]
Coursera to offer Masters Degree in Data Science
Coursera announced that it is teaming up with multiple universities, including Imperial College London, University of Illinois at Urbana-Champaign, University of Michigan and University of London, to launch online Master’s and Bachelor’s Degree programs. Two Master’s Degree in Data Science from Coursera Two of the degrees offered by Coursera are on Data Science. One is […]
Skimr: A R Package to Skim Summary Data Effortlessly
Exploring your data while dong analysis is extremely important. skimr, an R package, from rOpenSci is a great package that helps you get the summary statistics in a nice way, so you can quickly skim your data summary and understand it better. If you have not heard of rOpenSci, it is a non-profit initiative founded […]