Histograms are a great way to visualize a single variable. One of the problems with histograms is that one has to choose the bin size. With a wrong bin size your data distribution might look very different. In addition to bin size, histograms may not be a good option to visualize distributions of multiple variables […]
Python
How To Randomly Add NaN to Pandas Dataframe?
In this post we will see an example of how to introduce missing value, i.e. NaNs randomly in a data frame uusisng Pandas. Sometimes while testing a method, you might want to create a Pandas dataframe with NaNs randomly distributed. Here wee show how to do it. Let us load the packages we need Let […]
How to Implement Pandas Groupby operation with NumPy?
Pandas’ GroupBy function is the bread and butter for many data munging activities. Groupby enables one of the most widely used paradigm “Split-Apply-Combine”, for doing data analysis. Sometimes you will be working NumPy arrays and may still want to perform groupby operations on the array. Just recently wrote a blogpost inspired by Jake’s post on […]
How To Specify Colors to Scatter Plots in Python
Scatter plots are extremely useful to analyze the relationship between two quantitative variables in a data set. Often datasets contain multiple quantitative and categorical variables and may be interested in relationship between two quantitative variables with respect to a third categorical variable. And coloring scatter plots by the group/categorical variable will greatly enhance the scatter […]
How To Select Columns by Data Type in Pandas?
Often when you are working with bigger dataframe and doing some data cleaning or exploratory data analysis, you might want to select columns of Pandas dataframe by their data types. For example, you might want to quickly select columns that are numerical in type and visualize their summary data. Or you might want to select […]