In this tutorial as part of our Pandas 101 series, we will learn how to compute cumulative sum of a column based on values from a grouping column in Pandas dataframe. Pandas cumsum() function can compute cumulative sum over a DataFrame, In this example we are interested getting cumulative sum of just one column by […]
13 Tips to Randomly Select Rows with tidyverse
In this post, we will learn how to randomly sample rows from a data frame that is useful in most common scenarios. Tidyverse has a few options to randomly sample rows from a dataframe. slice_sample() in dplyr is the currently recommended function to use for randomly select rows. The older function in dplyr, sample_n(), for […]
3 Different ways to add regression line in ggplot2
In this post, we will learn how to add simple regression line in three different ways to a scatter plot made with ggplot2 in R. This is something I have to google almost every time, so here is the post recording the options to add linear regression line. We will use palmer penguin data to […]
dplyr matches(): select columns using regular expression
This quick post has an example using a neat dplyr function matches() to select columns using regular expressions. dplyr has a number of helper functions, contains(), starts_with() and others, for selecting columns based on certain condition. For example if you interested selecting columns based on how its starts with we can use start_with() function. However, […]
Pandas pipe function in Pandas: performing PCA
Pandas pipe function can help us chain together functions that takes either dataframe or series as input. In this introductory tutorial, we will learn how to use Pandas pipe method to simplify code for data analysis. We start with a dataframe as input and do a series of analysis such that that each step takes […]