Finally got a chance to write down quick thoughts on Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures
ICYMI, Fundamentals of Data Visualization is a fantastic book on data visualization that was developed openly, freely available and just recently the physical book is available for purchase.
I have used the book extensively even before it got published. And it was to great to grab a physical copy of Fundamentals of Data Visualization.
What is this book for?
Fundamentals of Data Visualization is a book for anyone who is interested in Data Science, practicing data science, and interested in learning good data visualization. It is a great reference book for most data visualization questions. The major scope of the book is static data visualizations that are suitable for “print, online, or as slides”.
Classic use cases for the book given a data set and a question of interest, what type of visualization can tell the complete story. For example, the fifth chapter of the book titled “Directory of visualizations” gives a quick visual overview of multiple types of plots that are useful for most common data visualization tasks. This chapter has been my go to place in the last year or two to seek data visualization inspirations.
What this book is NOT for?
Fundamentals of Data Visualization is not a book on R. Yes, Claus Wilke developed the whole book in using R, but this is not an R book. Therefore, the book does not contain any code or programming techniques to make visualizations.
Author wants the reader to mainly focus on the concepts and the figures, not the code to make it. However, if any one is curious one can check at the book’s Github repository https://github.com/clauswilke/dataviz. The github repo for the book is such a treasure trove to learn all things visualization with ggplot2, if you have a good grasp on ggplot2 and R.
Explaining more, the author says
This constant change in software platforms is one of the key reasons why this book is not a programming book and why I have left out all code examples. I want this book to be useful to you regardless of which software you use, and I want it to remain valuable even once everybody has moved on from ggplot2 and uses the next new thing. I realize that this choice may be frustrating to some ggplot2 users who would like to know how I made a given figure. To them I say, read the source code of the book. It is available. Also, in the future I may release a supplementary document focused just on the code.
What I really like about this book?
- The Directory of visualizations chapter is such a great resource to go to. The chapter contains visualization gallery of common types. When you are starting with a data visualization problem, this chapter is fantastic and will give tons of ideas on types of useful visualizations. If you are like me, it is really hard to name many data visualization type and the visual gallery in this chapter is extremely useful in identifying the visualization needed for the problem.
-
I am a huge fan of plots showing Empirical Cumulative Distribution Function (ECDF) and Q-Q plots. ECDF plot is a great alternative to histogram, as they can show all the data. Similarly, QQ plots are great way to compare two distributions. They are a bit of esoteric in nature, but once you get a hang of it, they are extremely powerful. And you don’t get to see them in many books, so I was really happy to see a whole chapter in the book 🙂
-
Visualizing uncertainty is very important and can be often challenging. The book has a whole chapter covering various aspects of visualizing uncertainty. And I have already learned a lot from this chapter and look forward to dig deeper and actually make the visualizations in the book soon.
-
Chapter 17 on the “Principle of Proportional Ink” is a must read for any one thinking of making data visualization. Was kind of sad that it is small chapter.
-
I love the colored animals peppered throughout the book offering great tips and warnings on making good data visualizations.
-
Staying on the topic of “warnings”, one of the coolest things about the book is how the author presents in multiple ways to visualize a dataset and provide visualization examples that are ugly, bad and wrong 🙂 In every such example, the author gives his reason for making such a call. It is just a fantastic way to learn how not to make a visualization mistake and make a better visualization. It has been fun to pick a page from the book with bad/wrong/ugly annotation and test your knowledge guessing the reason for it. You will be surprised how much one can learn from this.
Here are some example figures, play the game “why is it bad/wrong/ugly?.
- https://serialmentor.com/dataviz/visualizing-amounts.html#fig:boxoffice-rot-axis-tick-labels
- https://serialmentor.com/dataviz/visualizing-amounts.html#fig:boxoffice-horizontal-bad-order
- https://serialmentor.com/dataviz/boxplots-violins.html#fig:lincoln-temp-points-errorbars
- https://serialmentor.com/dataviz/visualizing-proportions.html#fig:marketshare-stacked
- https://serialmentor.com/dataviz/visualizing-proportions.html#fig:marital-vs-age
- https://serialmentor.com/dataviz/time-series.html#fig:bio-preprints-dots
In short, it is a must have reference book for anyone analyzing data in any field. Yes, the online version is freely available, but it is really a great bargain when Amazon sells it for $40 instead of the original $70.