Python Pandas library is well known for its amazing data munging capabilities. However, a little underused feature of Pandas is its plotting capabilities. Yes, one can make better visualizations with Matplotlib or Seaborn or Altair. However, Pandas plotting capabilities can be extremely handy when you are in exploratory data analysis mode and want to quickly […]
Python
Altair 4.0 is here: Barplots, Scatter Plots with Regression Line and Boxplots
Altair 4.0 is here with a lot of new features. Altair is one of the newest data visualization libraries in Python using a grammar of interactive graphics. Altair is one of my favorites. Not a long ago, but still remember the first time I saw an Altair plot, a chart in “Altair-speak” and pretty impressed […]
How To Discretize/Bin a Variable in Python with NumPy and Pandas?
Sometimes you may have a quantitative variable in your data set and you might want to discretize it or bin it or categorize it based on the values of the variable. For example, let us say you have measurements of height and want to discretize it such that it is 0 or 1 depending on […]
How to Highlight Data Points with Colors and Text in Python
Sometimes you might want to highlight a select data points on a scatter plot. Often when plotting scatter plots you might want to highlight data points in a different color from the rest of the data points. Other times you want to show select data points in different color and annotate them with text. In […]
Data Science From Scratch 2nd Edition: Book Review
The second edition of Data Science from Scratch, First Principles with Python from Joel Grus is here (since the summer of 2019). The first edition of the book came about 4-5 years ago when data science as a field was nascent and majority of Python was in 2.7. There are two aspects to learn data […]