If you ask any practicing data scientist for recommendation for a good book to learn data science, you will get too many different useful books. Some may be introductory level, some may be introductory level with a bit of math not just coding, a bit advanced level, and some might be from different domains. It is obvious to see why that is the case, data science is a very broad term with applications in multiple domains.
Here are three very interesting, but a bit disparate books that will be of great use for different aspects of data science. Not so surprisingly, these three books are from eminent scientists from very different domains. The first one just became available and only available as physical copy. The other two are not out yet, but available for free online and physical copy is expected in April 2019.
Linear Algebra and Learning from Data
Legendary MIT Professor Gilbert Strang’s Linear Algebra book and course is possibly the best introduction to linear algebra ever. Linear Algebra is extremely integral to the core of data science.
Yes, it is for a bit advanced, but a good handle on Matrices and Matrix decompositions is a great asset for data scientist’ tool box. Prof. Strang’s Linear Algebra course lecture has been available online freely over a decade and it is still the best. It is a must for anyone to understand matrices.
Now Prof. Gilbert strang is back with a new book, titled, Linear Algebra and Learning from Data
.The new textboook got just published and aims to readers “to understand the steps that lead to deep learning”. The book uses “the full array of applied linear algebra, including randomization for very large matrices. Then deep learning creates a large-scale optimization problem for the weights solved by gradient descent or better stochastic gradient descent. Finally, the book develops the architectures of fully connected neural nets and of Convolutional Neural Nets (CNNs) to find patterns in data.”
The book also claims that
This book is for anyone who wants to learn how data is reduced and interpreted by and understand matrix methods. Based on the second linear algebra course taught by Professor Strang, whose lectures on the training data are widely known, it starts from scratch (the four fundamental subspaces) and is fully accessible without the first text.
Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures
Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures by UT Austin Professor Claus Wilke will be a classic on the basics of Data Visualization. Prof. Wilke started started developing/making the book online about a year ago. It was fantastic to watch the growth of the book online for free with amazing content.
One of the chapters in the book called “Directory of visualizations” has been my first go to place while starting with any visualization. The chapter gives a quick visual overview of different types of plots that could be suitable for common visualization tasks. Obviously the book covers each item in the directory of visualizations in separate chapters.
The physical copy will be available in early April 2019. But why wait the whole book is available for free here https://serialmentor.com/dataviz//
Modern Statistics for Modern Biology
The third new book Modern Statistics for Modern Biology is an odd one. Yes, it is meant for biologists. However, it is probably one of the best introductions to modern statistics using R and great visualization with ggplot2.
Stanford Prof. Susan Holmes and EMBL professor Wolfgang Huber wrote the book from their teaching and made the book freely available online long before the print version. One of the biggest highlights of the book is the way it is teaching Modern Statistics, which is way more computational than before. The book has 13 well-written chapters that are accessible to beginners with a right amount of R code, theory, and great visualization with ggplot2.
The three chapters for anyone interested in data science should start right away are the chapters on Mixture Models, multi-variate analysis covering dimensionality reduction and supervised learning covering Linear Discriminant Analysis to glmnet. The example data sets used in the book are really interesting. No offense to Iris dataset, but it is such a refresher to see examples of Supervised/Unsupervised learning with datasets other than Iris dataset 🙂
Finally the print version of the book is available on amazon, can’t wait to get a copy.
[Updated: 3/3/2019]
Introduction to Probability (Second Edition)
It is never a good idea to write post with “N” things :). Not even a few weeks went by, this post “3 data science-y books” became out dated. Thanks to the fantastic book, Introduction to Probability by Joe Blitzstein and Hwang.
If you are not familiar with Joe Blitzstein, he is an amazing teacher, Statistics Professor at Harvard, and widely known for his course on probability at Harvard. The video lectures of the introductory Probability course has been available on youtube for a while. And they are a must watch, if you are anywhere close data science.
Just a few years ago, Blitzstein and Hwang published the first edition of book, Introduction to Probability, based on the course.
Joe Blitzstein has just published the second edition of probability book and made it available for free online.
New edition of my probability book with Jessica Hwang is out! Read it FREE online at https://t.co/2iwKDNe0ag pic.twitter.com/5z1zOkJ3Y5
— Joe Blitzstein (@stat110) March 1, 2019
Introduction to Probability provides essential language and tools for understanding statistics, randomness, and uncertainty. The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo (MCMC).
Each chapter ends with a section showing how to perform relevant simulations and calculations in R, a free statistical software environment.
The second edition adds many new examples, exercises, and explanations, to deepen understanding of the ideas, clarify subtle concepts, and respond to feedback from many students and readers.
The link http://probabilitybook.net/ to free online version is down now (3/3/209). One can access at https://t.co/6zQfTlfGiM for sometime. The print version of Introduction to Probability, Second edition of the book is also available now. Not just this, there is a web resource for the book with great materials at https://projects.iq.harvard.edu/stat110. Don’t forget to check Bayesville 🙂
I want all of these book NOW, but for now I have to build to build probability model in R to prioritize which one to buy first 🙂
Check out a lot more free online data science books in R and three new Machine Learning books for Data Science.