In simpler statistical models, we typically assume our data came from a single distribution. For example, to model height, we can assume that each observation came from a single Gaussian distribution with some mean and variance. However, often we might be in a scenario where that assumption is not valid and our data is more […]
Pandas filter(): Select Columns and Rows by Labels in a Dataframe
In this post, we will learn how to use Pandas filter() function to subset a dataframe based on its column names and row indexes. Pandas has a number of ways to subset a dataframe, but Pandas filter() function differ from others in a key way. Pandas filter() function does not filter a dataframe on its […]
How To Delete Rows in Pandas Dataframe
Pandas make it easy to delete rows of a dataframe. There are multiple way to delete rows or select rows from a dataframe. In this post, we will see how to use drop() function to drop rows in Pandas by index names or index location.. Pandas drop() function can also be used drop or delete […]
7 Tips to Add Columns to a DataFrame with add_column() in tidyverse
Often while doing data analysis, one might create a new column or multiple columns to an existing data frame. In this post we will learn how to add one or more columns to a dataframe in R. tibble package in tidyverse, has a lesser known, but powerful function add_column(). We will learn 6 tips to […]
How to Combine Year, Month, and Day Columns to single date in Pandas
In this post, we will see how to combine columns containing year, month, and day into a single column of datetime type. We can combine multiple columns into a single date column in multiple ways. First, we will see how can we combine year, month and day column into a column of type datetime, while […]