15 Free Online Resources/Books to learn R and Data Science

If you are interested in learning Data Science with R, but not interested in spending money on books, you are definitely in a good space. There are a number of fantastic books and resources available online for free from top most creators and scientists. Here are such 13 free (so far) online data science books […]

How To Assign Specific Colors to Boxplots in Seaborn?

Boxplots with actual data points are one of the best ways to visualize the distribution of multiple variables at the same time. Creating a beautiful plot with Boxplots in Python Pandas is very easy. In an earlier post, we saw a good example of how to create publication quality boxplots with Pandas and Seaborn. If […]

How To Collapse Multiple Text Columns in Dataframe Using Tidyverse?

Often you may have a data frame, where multiple columns are related and you may want to combine those related columns into a single column. In an earlier post, we saw how we can collapse a numerical data frame with related columns using Python. In this post, we consider the problem of collapsing or combining […]

How to Split a Text Column into Two Columns in Pandas?

Often you may have a text column in your pandas data frame and you may want to manipulate the text like splitting the column into two columns in the data frame. For example, one of the columns in your data frame is full name and you may want to split into first name and last […]

String Manipulations in Pandas

Python is known for its ability to manipulate strings. Pandas extends Python’s ability to do string manipulations on a data frame by offering a suit of most common string operations that are vectorized and are great for cleaning real world datasets. Let us some simple examples of string manipulations in Pandas Let us use gapminder […]

How to Change Data Type for One or More Columns in Pandas Dataframe?

Sometimes when you create a data frame, some of the columns may be of mixed type. And you might see warning like this DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False. We get this error when Pandas tries to guess the type for each element of a column. For […]

How to Collapse Multiple Columns in Pandas? Groupby with Dictionary

Often you may want to collapse two or multiple columns in a Pandas data frame into one column. For example, you may have a data frame with data for each year as columns and you might want to get a new column which summarizes multiple columns. One may need to have flexibility of collapsing columns […]

How To Convert a Column to Row Name/Index in Pandas?

Pandas has a method set_index to covert a column in Pandas dataframe into rowname or row index. Let us see an example of converting a column name into rowname in Pandas. Let us load pandas as “pd”. Let us use real-world gapminder data from vega_datasets. Convert a Column to Row Name Let us convert the […]

Short videos to Learn Basics of Probability and Statistics

Basic concepts in Probability and statistics are at the heart of Data Science. And there is no better person than Prof.Joe Blitzstein to learn Probability and statistics. Joe has come up with amazing short videos explaining the basic concepts for his new course. If you are data science beginner or a veteran, Joe’s short videos […]

ggplot2 Version 3.0.0 Brings Tidy Evaluation to ggplot

RStudio has unveiled major updates to ggplot2 with new version 3.0.0. The new ggplot2 version is available on CRAN about two weeks ago. ggplot2 3.0.0 was originally announced as ggplot2 2.3.0, but big updates made RStudio to bump the version number to 3.0.0. One of the biggest additions in the new version is that ggplo2 […]

Publication Quality Graphics in #rstats

The visualization guru, Edward Tufte, known for all things visualization, tweeted that #rstats alone is not good enough for phublication quality graphics. He claimed “Publication-quality work requires: R + Adobe Illustrator + reasoning about words on graphics + respect for audience/readers/viewers “. #Rstats coders and users just can’t do words on graphics and typography. Proof: […]