The power of ggplot2 lies in making it easy to make great plots and in easily tweaking it to the one wants. Sometimes, one might want to highlight certain data points in a plot in different color. Here we will see an example of highlighting specific data points in a plot. Let us first load […]
How to Implement Pandas Groupby operation with NumPy?
Pandas’ GroupBy function is the bread and butter for many data munging activities. Groupby enables one of the most widely used paradigm “Split-Apply-Combine”, for doing data analysis. Sometimes you will be working NumPy arrays and may still want to perform groupby operations on the array. Just recently wrote a blogpost inspired by Jake’s post on […]
Implementing K-means clustering in Python from Scratch
K-means clustering is one of the commonly used unsupervised techniques in Machine learning. K-means clustering clusters or partitions data in to K distinct clusters. In a typical setting, we provide input data and the number of clusters K, the k-means clustering algorithm would assign each data point to a distinct cluster. In this post, we […]
R Graphics Cookbook Second Edition is Available for Free
Winston Chang from RStudio quietly announced last week that the second edition of his popular R Graphics Cookbook: Practical Recipes for Visualizing Data is available now to buy. Not just that, the book is also available online for free at https://r-graphics.org/. Winston Chang’s first edition of R Graphics Cookbook was the first R book I […]
PCA example using prcomp in R
In this tutorial, we will learn how to perform PCA in R using prcomp() function in R. Principal Component Analysis, aka, PCA is one of the commonly used approaches to do unsupervised learning/ dimensionality reduction. It is a fantastic tool to have in your data science/Machine Learning arsenal. You will be surprised how often the […]