If you are a beginner in learning data science, understanding probability distributions will be extremely useful. One of the best ways to understand probability distributions is simulate random numbers or generate random variables from specific probability distribution and visualizing them. 9 Most Commonly Used Probability Distributions There are at least two ways to draw samples […]
5 Big Ideas Behind Tidy Evaluation
Ever wondered, how easy it is to write dataframe manipulation code without repeating yourself while using dplyr ? For example, if you are filtering a dataframe, you simply write instead of writing like this where you need to refer the dataframe multiple times and use “$” to access variables in the dataframe. The reason why […]
Data Science Professional Certificate Program from Harvard/edX
There are a number of really good ways get started learning Data Science. I just came across this really nice Data Science certificate course from Harvard/edX. The Data Science certificate program offers a series of courses that covers the basics of Data Science; probability, statistical inference, regression, and machine learning. It uses R programming and […]
How To Filter Pandas Dataframe By Values of Column?
In this post, we will learn how to filter Pandas dataframe by column values. More specifically, we will subset a pandas dataframe based on one or more values of a specific column. In this tutorial, we will see SIX examples of using Pandas dataframe to filter rows or select rows based values of a column(s). […]
Pandas GroupBy: Introduction to Split-Apply-Combine
In a classic paper published at 2011, Hadley Wickham asked What do we do when we analyze data? What are common actions and what are common mistakes? And then went ahead to spell it out one of the most common strategies, Split-Apply-Combine, that is used in common data analysis. Intuitively, while solving a big problem, […]


