Computing standardized values of one or more columns is an important step for many machine learning analysis. For example, if we are using dimentionality reduction techniques like Principal Component Analysis (PCA), we will typically standardize all the variables. To standardize a variable we subtract each value of the variable by mean of the variable and […]
Python
How To Get Number of Missing Values in Each Column in Pandas
In this post we will see how can we get the counts of missing values in each column of a Pandas dataframe. Dealing with missing values is one of the common tasks in doing data analysis with real data. A quick understanding on the number of missing values will help in deciding the next step […]
Python Machine Learning Third Edition: Book Review
Finally got a chance to get a look at Sebastian Raschka’s Third Edition of Python Machine Learning with the focus on Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2. It is a big book and around for a while in ML/DL time scales. I always wanted to check it. Thanks to the […]
Seaborn Version 0.11.0 is here with displot, histplot and ecdfplot
Seaborn, one of the data visualization libraries in Python has a new version, Seaborn version 0.11, with a lot of new updates. One of the biggest changes is that Seaborn now has a beautiful logo. Jokes apart, the new version has a lot of new things to make data visualization better. This is a quick […]
Pandas Groupby and Sum
A common step in data analysis is to group the data by a variable and compute some summary statistics each subgroup of data. For example, one might be interested in mean, median values, or total sum per group. In this post, we will see an example of how to use groupby() function in Pandas to […]