Python and R Tips

Empirical cumulative distribution function (ECDF) in Python

May 17, 2019 by cmdlinetips

Histograms are a great way to visualize a single variable. One of the problems with histograms is that one has to choose the bin size. With a wrong bin size your data distribution might look very different. In addition to bin size, histograms may not be a good option to visualize distributions of multiple variables […]

How To Randomly Add NaN to Pandas Dataframe?

May 12, 2019 by cmdlinetips

In this post we will see an example of how to introduce missing value, i.e. NaNs randomly in a data frame uusisng Pandas. Sometimes while testing a method, you might want to create a Pandas dataframe with NaNs randomly distributed. Here wee show how to do it. Let us load the packages we need Let […]

How To Highlight Select Data Points with ggplot2 in R?

May 9, 2019 by cmdlinetips

The power of ggplot2 lies in making it easy to make great plots and in easily tweaking it to the one wants. Sometimes, one might want to highlight certain data points in a plot in different color. Here we will see an example of highlighting specific data points in a plot. Let us first load […]

How to Implement Pandas Groupby operation with NumPy?

May 8, 2019 by cmdlinetips

Pandas’ GroupBy function is the bread and butter for many data munging activities. Groupby enables one of the most widely used paradigm “Split-Apply-Combine”, for doing data analysis. Sometimes you will be working NumPy arrays and may still want to perform groupby operations on the array. Just recently wrote a blogpost inspired by Jake’s post on […]

Implementing K-means clustering in Python from Scratch

May 5, 2019 by cmdlinetips

K-means clustering is one of the commonly used unsupervised techniques in Machine learning. K-means clustering clusters or partitions data in to K distinct clusters. In a typical setting, we provide input data and the number of clusters K, the k-means clustering algorithm would assign each data point to a distinct cluster. In this post, we […]