When working with high-dimensional data, preprocessing and normalizing the data are key important steps in doing data analysis. Quantile normalization is one such statistical methods that can be useful in analyzing high-dimensional datasets. One of the main goals performing normalization like Quantile normalization is to transform the raw data such that we can remove any […]
dplyr 1.0.0 is here: Quick fun with Summarise() and rowwise()
New version of dplyr, version 1.0.0 is here. It was originally supposed to be available in early May and finally out on CRAN now. One of the cool things with the new dplyr version 1.0.0 is its cool new logo. Jokes apart, dplyr 1.0.0 is loaded with new features and Hadley Wickham has started teasing […]
Fun with Pandas Groupby, Aggregate, Multi-Index and Unstack
This post is titled as “fun with Pandas Groupby, aggregate, and unstack”, but it addresses some of the pain points I face when doing mundane data-munging activities. Every time I do this I start from scratch and solved them in different ways. The purpose of this post is to record at least a couple of […]
Linear Regression Using Matrix Multiplication in Python Using NumPy
Linear Regression is one of the commonly used statistical techniques used for understanding linear relationship between two or more variables. It is such a common technique, there are a number of ways one can perform linear regression analysis in Python. In this post we will do linear regression analysis, kind of from scratch, using matrix […]
ggplot2 3.3.0. Is Here : Two New Features You Must Know
ggplot2, the R package that lets you create graphics using the Grammar of Graphics has a new version. The new version of ggplot2; version 3.3.0 has lots of changes and it available on CRAN. Introducing ggplot2 v 3.3.0 Thomas Lin Pedersen says that the new version “is packed with features, big and small” and a […]