When working with high-dimensional data, preprocessing and normalizing the data are key important steps in doing data analysis. Quantile normalization is one such statistical methods that can be useful in analyzing high-dimensional datasets. One of the main goals performing normalization like Quantile normalization is to transform the raw data such that we can remove any […]
Python
Getting Started with Pandas Groupby
Pandas groupby function is one of the most useful functions enabling a bunch of data munging activities. A simple use case of groupby function is that we can group a bigger dataframe by a single variable in the dataframe into multiple smaller dataframes. Typically, after grouping by a variable, we perform some computations on each […]
Fun with Pandas Groupby, Aggregate, Multi-Index and Unstack
This post is titled as “fun with Pandas Groupby, aggregate, and unstack”, but it addresses some of the pain points I face when doing mundane data-munging activities. Every time I do this I start from scratch and solved them in different ways. The purpose of this post is to record at least a couple of […]
How To Insert a Column at Specific Location in Pandas DataFrame?
In this post, we will learn how to insert a column at specific location in a Pandas dataframe. We will use Pandas insert() function to insert column into DataFrame at a specified location with a specific name. Let us create a data frame using NumPy’s random module. Our data frame looks like this. We have […]
Linear Regression Using Matrix Multiplication in Python Using NumPy
Linear Regression is one of the commonly used statistical techniques used for understanding linear relationship between two or more variables. It is such a common technique, there are a number of ways one can perform linear regression analysis in Python. In this post we will do linear regression analysis, kind of from scratch, using matrix […]