How To Change Legend Title in ggplot2?

In this post, we will see examples of how to change the legend title in ggplot2. When you make a plot with ggplot2 and color/highlight data points by a variable in the input dataframe, ggplot2 uses the name of the variable present in the dataframe. However, sometimes you might want to change the legend title. […]

8 ggplot themes to make your plots look great

ggplot2 is awesome. It enables people to easily make high quality data visualization plots. However, people who spent a lot of time with ggplot2 have love/hate relationship with the default ggplot2 theme, where a plot is on a grey background. The default ggplot2 theme is called theme_grey() or theme_gray(). In addition to the default theme, […]

How to Add Group-Level Summary Statistic as a New Column in Pandas?

In this post, we will see an example adding results from one of aggregating functions like mean/median after group_by() on a specific column as a new column. In other words, we might have group-level summary values for a column and we might to add the summary values back to the original dataframe we computed group-level […]

How to Drop Rows Based on a Column Value in Pandas Dataframe?

In this post we will see examples of how to drop rows based on values of one or more columns in Pandas. Pandas drop function makes it really easy to drop rows of a dataframe using index number or index names. We can use Pandas drop function to drop rows and columns easily. Sometimes you […]

Pandas groupby: 13 Functions To Aggregate

Pandas groupby function enables us to do “Split-Apply-Combine” data analysis paradigm easily. Basically, with Pandas groupby, we can split Pandas data frame into smaller groups using one or more variables. Pandas has a number of aggregating functions that reduce the dimension of the grouped object. In this post will examples of using 13 aggregating function […]

How To Drop Duplicate Rows in Pandas?

One of the common data cleaning tasks is to make a decision on how to deal with duplicate rows in a data frame. If the whole row is duplicated exactly, the decision is simple. We can drop the duplicated row for any downstream analysis. Sometime, you may have to make a decision if only part […]

Introduction to Linear Regression in Python

Linear regression is one of the most commonly used statistical technique to understand relationship between two quantitative variables (in the simplest case). Simple linear regression models relationship between two variables X and Y, where X and Y are vectors with multiple values. For example, X could how well each country is doing economically, like GDP […]

tidyr 1.0.0 is here. pivot_longer & pivot_wider replace spread & gather

tidyr version 1.0.0 is here with a lot of new changes. tidyr has been around for about five years and it has finally tidyr has reached version 1.0.0. There are four big changes in the new version of tidyr. One of the biggest changes is the new functions pivot_longer() and pivot_wider() for reshaping tabular dataserts. […]

How to Create Ordered Dictionary in Python?

Dictionary in Python is one of the most useful core data structures in Python. Sometimes, you may want to create a dictionary and also maintain the order of items you inserted when you are iterating the keys. Python’s collections module has OrderedDict that lets you create a ordered dictionary. Let us see an example of […]

How to Make a R Package from Scratch using RStudio?

Creating your first R package from scratch can look really daunting at first. The modern toolkits like RStudio IDE and devtools R package make it a lot easier to get started and create a new R package. Recently came across the second edition of R Packages book by Hadley Wickham and Jenny Bryan and it […]