20 Free Online Books to Learn R and Data Science

If you are interested in learning Data Science with R, but not interested in spending money on books, you are definitely in a good space. There are a number of fantastic books and resources available online for free from top most creators and scientists. Here are such 13 free (so far) online data science books […]

3 Basic Commands to Manipulate NumPy 2d-arrays

NumPy or Numerical Python is one of the packages in Python for all things computing with numerical values. Learning NumPy makes one’s life much easier to compute with multi-dimensional arrays and matrices. A huge collection of very useful mathematical functions available to operate on these arrays these arrays makes it one of the powerful environment […]

How To Separate a Column into Multiple Rows with in R?

I just came across a useful little function in tidyr called separate_rows(). Often you may have a data frame with a column containing multiple information concatenated together with a delimiter. For example, we might have data frame with members of a family in a column separated by a delimiter. Here is a pictorial representation of […]

How To Reshape Pandas Dataframe with melt and wide_to_long()?

Reshaping data frames into tidy format is probably one of the most frequent things you would do in data wrangling. A data frame is tidy when it satisfies the following rules. Each variable in the data set is placed in its own column Each observation is placed in its own row Each value is placed […]

Introduction to Sparse Matrices in R

Often you may deal with large matrices that are sparse with a few non-zero elements. In such scenarios, keeping the data in full dense matrix and working with it is not efficient. A better way to deal with such sparse matrices is to use the special data structures that allows to store the sparse data […]

Book Review: Fundamentals of Data Visualization

Finally got a chance to write down quick thoughts on Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures by Claus Wilke. ICYMI, Fundamentals of Data Visualization is a fantastic book on data visualization that was developed openly, freely available and just recently the physical book is available for purchase. I have […]

Singular Value Decomposition (SVD) in Python

Matrix decomposition by Singular Value Decomposition (SVD) is one of the widely used methods for dimensionality reduction. For example, Principal Component Analysis often uses SVD under the hood to compute principal components. In this post, we will work through an example of doing SVD in Python. We will use gapminder data in wide form to […]

How To Do PCA in tidyverse Framework?

In an earlier post, we saw a tutorial on how to do PCA in R using gapminder data set. Another interesting way of doing PCA is to follow the tidyverse framework. In this post, we will see an example of doing PCA analysis using gapminder data in a tidy framework. Being the first attempt to […]

How To Create a Column Using Condition on Another Column in Pandas?

Often while cleaning data, one might want to create a new variable or column based on the values of another column using conditions. In this post we will see two different ways to create a column based on values of another column using conditional statements. First we will use NumPy’s little unknown function where to […]

Empirical cumulative distribution function (ECDF) in Python

Histograms are a great way to visualize a single variable. One of the problems with histograms is that one has to choose the bin size. With a wrong bin size your data distribution might look very different. In addition to bin size, histograms may not be a good option to visualize distributions of multiple variables […]

How To Randomly Add NaN to Pandas Dataframe?

Sometimes while testing a method, you might want to create a Pandas dataframe with NaNs randomly distributed. In this post we will see an example of how to introduce NaNs randomly in a data frame with Pandas. Let us load the packages we need Let us use gaominder data in wide form to introduce NaNs […]