21 Free Online Books to Learn R and Data Science

R Books/Resources for Data Science
R Books/Resources for Data Science

R Books/Resources for Data Science
R Books/Resources for Data Science

If you are interested in learning Data Science with R, but not interested in spending money on books, you are definitely in a very good space. There are a number of fantastic R/Data Science books and resources available online for free from top most creators and scientists.

Here are such 13 free 21 free (so far) online data science books and resources for learning data analytics online from people like Hadley Wickham, Winston Chang, Garrett Grolemund and Johns Hopkins University Professor Roger Peng.

[P.S] Since the post was written the fantastic data science book/resource list has grown from 13 to 20. Just tweet at @cmdline_tips for any missing resources.

1. R for Data Science
by Hadley Wickham

R for Data Science

R for Data Science, by Hadley Wickham and Garrett Grolemund, is a great data science book for beginners interesterd in learning data science with R. This book, R for Data Science introduces R programming, RStudio- the free and open-source integrated development environment for R, and the tidyverse, a suite of R packages designed by Wickham “to work together to make data science fast, fluent, and fun”. Hadley Wickham wrote this data science book online and is available for free online at https://r4ds.had.co.nz/. The physical copy of R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, is priced at $40.00 without any discount. Typically with discount it is much cheaper. (Update: during thanksgiving 2018 amazon is selling R for Data Science for just $18.)

2. Introduction to Data Science, R. Irizarry

Introduction to Data Science, by R Irizarry

Rafael Irizarry, Harvard Professor and fantastic teacher has published a wonderful introductory Data Science Book. The book titled, Introduction to Data Science, is available for free (and name your price) from https://leanpub.com/datasciencebook. Introducing the book, Prof. Irizarry says

This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning. It also helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, algorithm building with caret, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with knitr and R markdown. The book is divided into six parts: R, Data Visualization, Data Wrangling, Probability, Inference and Regression with R, Machine Learning, and Productivity Tools

3. Advanced R by Hadley Wickham

Advanced R

Advanced R is another gem by Hadley Wickham, aimed at intermediate and advanced R users. In addition to teaching fundamentals of R and the data types, the book shows how functional programming can be used to solve a wide range of problems.

You know R is not the fastest language, but if you are interested in making your R code faster and memory efficient code, this is the book that you want. Free online version of the book is available at https://adv-r.had.co.nz/, and a physical copy of Advanced R costs you around $35 in Amazon.

4. R Packages by Hadley Wickham

R Packages

Hadley Wickham has made yet another book available for free and this is on how to create your own R Packages. Forgot where I read, it went something like this. If you are doing the same thing while programming for three times, you write a function. And if you write three functions to do something, it is time to create a package. Learning to write R packages is definitely one of the data science toolkits to have. Hadley Wickham summarizes the book beautifully

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. In this section you’ll learn how to turn your code into packages that others can easily download and use. Writing a package can seem overwhelming at first. So start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better.

Get started packaging your functions with R Packages book available freely at https://r-pkgs.had.co.nz/. Physical copy of R Packages is available for about 28$ on Amazon.

5. ggplot2 by Hadley Wickham

ggplot2

No doubt, ggplot2 is one of the best data visualizations tool available exploring data and for doing data science. The original ggplot book is about 10 years old and not available freely. Hadley Wickham has been working on the new ggplot2 book and it is available online free. It is available at https://github.com/hadley/ggplot2-book.

Once you get a hang of ggplot, ggplot2’s reference website is a fantastic resource for all things data visualization.  This website has all commands of ggplot.  And each reference page has all the available options for the ggplot command and then easy to understand code chunk showing how to use the command to create visualization the way you want.

6. Cookbook for R, Winston Chang

Winston Chang’s R Graphics Cookbook 2nd Edition

Winston Chang from R Studio, has a great book on all things graphics with R, R Graphics Cookbook. Unfortunately, that book is not free online.

Winston Chang has made a great resource website, https://www.cookbook-r.com/, available for free. It is a fantastic resource for any getting started with plotting with ggplot and more.

It has a lots of code chunks answering common questions that arise while making publication quality graphics with R.

[Updated 28 April 2019]. The second edition of R Graphics Cookbook: Practical Recipes for Visualizing Data is available now for purchase. And Winston Chang has made the second edition available for free online at https://r-graphics.org/. And the second edition is tidyverse friendly 🙂

7. Data Visualization: A practical introduction, by Kieran Healy

Data Visualization, A practical introduction

Data Visualization: A practical introduction by Duke University professor Kieran Healy,  is a new book that is not available in print yet. However, the draft (reaaaallly good one) version is available online for free at https://socviz.co.

This data science book does not assume prior knowledge of R and offers a hands-on introduction to visualizing data using R and Hadley Wickham’s ggplot. One of the good things about this book is it has a series of worked examples that helps you build visualizations for data science piece by piece, from simple scatterplots to more complex graphics. If you love playing with geo-spatial data and want to make awesome maps, this book has a whole chapter devoted to visualizing geographical data with R.

Update: Physical copy of Data Visualization: A Practical Introduction is available on Amazon right on time during holiday season of 2018 and it costs $40 without any discount.

Want to know more? Check out our review on the book here

8. R Programming for Data Science, Roger D Peng

R Programming for Data Science

R Programming for Data Science is a a great data science book from Roger D Peng, JHU professor with materials from his Johns Hopkins Data Science Specialization course. The book is available online at leanpub, where you can fix your own price to buy this book, from 0 dollars to anything you wish. The book is also available in print through Lulu.

This data science book covers the basics of R programming needed for doing data science with R and interesting topics that you may not see else where, like regular expressions, debugging, parallel computing, and R profiling.

9. Exploratory Data Analysis with R, Roger D Peng

Exploratory Data Analysis with R

It is the awesome Roger Peng again, and this time the book is all about Exploratory Data Analysis using R. This book is also based on courses  from Johns Hopkins Data Science Specialization and available from https://leanpub.com/exdata for a price that you are willing to pay (zero to anything).

In addition to covering the basics of exploratory analysis, the book also covers topics needed for analyzing and visualizing high-dimensional or multidimensional data, like Hierarchical Clustering, K-means clustering, and dimensionality reduction techniques- SVD and PCA.

In addition to these books, Roger Peng and Dr. Hilary Parker, a Data Scientist at Stitch Fix have a wonderful data science podcast “Not So Standard Deviations“. This podcast is a must, if you are interested in learning Data Science and want to pursue data science career. In this podcast, they discuss all about data science, common issues and problems in analyzing data. The podcast offers a really a nice perspective of data science as a field in both academia and industries.

10. An Introduction to Statistical and Data Sciences via R,  Chester Ismay and Albert Y. Kim

An Introduction to Statistical and Data Sciences via R

It is online only data science book developed by Chester Ismay and Albert Y. Kim and covers the basics of statistics for data science using R.  This book teaches you how to explore data, basics of statistics for data science and create data stories using R.  If you are curious about data stories, check the samples of “data stories” from one of the classes here. This book was written using RStudio’s bookdown package and available at ModernDive.com.

11. Software Carpentry

Software Carpentry

Software carpentry is a volunteer run non-profit organization with the goal to teach basic computing skills for researchers. It has hundreds of volunteers around the world, teaching two-day workshops for beginners on a variety of computing topics. They have troves of open -source lesson materials polished by these volunteer instructors. Software Carpentry has two workshop lessons teaching R to people with no prior programming experience.

  • Programming with R

    • Programming with R lessons teaches the basics of computing language and the basics of data analysis using a simple data set. Not just that, it also teaches you how make dynamic documents with R Markdown using kinitr and how you can create R packages.
  • R for Reproducible Scientific Analysis

    • R for Reproducible Scientific Analysis teaches basics of R for beginners with the rich gapminder data set, a real world data of countries over a long time period. This workshop lessons cover data structures in R, data visualization with ggplot2, data frame manipulation with dplyr and tidyr and making reproducible markdown documents with Knitr.

Why wait, just look here to find if there is any nearby two day workshops (mostly free) from Software Carpentry.

12. Text Mining with R: A Tidy Approach by Julia Silge and David Robinson

Text Mining with R

Text Mining with R: A Tidy Approach is a great introductory book for learning to mine text data with R. What is better is that it uses the principles of tidy data and thus lets you practice tidyverse principles in text datasets. It has loads of examples of using R and tidyverse to explore literature, news, and social media data and gain meaningful insights. It is a must book for doing data science with texts and sentiment analysis.

If you are interested in analyzing social media data, this book is for you. It has a whole chapter on analyzing twitter data and doing sentiment analysis. The book is freely available online at https://www.tidytextmining.com/.  Physical copy of Text Mining with R: A Tidy Approach is available for purchase for about $30.00.

13. Introduction to Empirical Bayes: Examples from Baseball Statistics by  David Robinson

Introduction to Empirical Bayes

This empirical bayes ebook, which initially started as a series of blogposts, introduces the empirical Bayesian approach to estimation, credible intervals, A/B testing and mixture models with R code examples using baseball batting averages.

Although the title looks like this book is for baseball aficionados, the book is a treat for anyone learning data science.  The statistical methods illustrated (with data and R) in the book are the same and effective in estimating click-through rates on ads, success rates of experiments, and so on. It is one of the best books to learn data science and learn statistics for data science.

The pdf version of this book is available at https://gumroad.com/l/empirical-bayes for name your own price starting from $0.

14. Data Analysis for the Life Sciences with R,  Rafael A Irizarry and Michael I Love

Data Analysis for the Life Sciences

If you are interested in learning data analysis and statistical analysis with R in life sciences, the Harvard  team Irizarry and Love, has a great book in Data Analysis for the Life Sciences with R. Although this book mainly focuses on high throughput data from genomics, the methods described in this book are ideally suited for modern data science in any domain.

The book is the result of teaching from multiple courses on data science in the popular HarvardX. It covers wide range of topics including, Exploratory Data Analysis, Basic Statistical Models, Inference For High Dimensional Data, Dimensionality reduction and basic machine learning.

This book covers all these rich topics without getting you bogged down with the math behind them. It offers R code to solve a problem with data and helps you gain better intuition behind the math/theory. The pdf version of this book is available freely on leanpub, https://leanpub.com/dataanalysisforthelifesciences, with the option to name your price.

15. Fundamentals of Data Visualization, Clause Wilke

Fundamentals of Data_Visualization

Claus Wilke, a professor from UT Austin has a new upcoming book on data visualization, one of the key aspects of data science. The book is currently titled “Fundamentals of Data Visualization” and will be published by O’Reilly. An early version of the book is available freely online https://serialmentor.com/dataviz/
Claus Wilke wrote the book online and made it freely available. Now Fundamentals of Data Visualization the book is read to pre-order at Amazon.

16. stat545

STAT 545

stat545, aka, Data wrangling, exploration, and analysis with R, one of best courses teaching data munging and all things R, initially taught byJenny Bryan at UBC. It is a must if you are interested in R and want to learn data analysis and make it easily reproducible, reusable, and shareable. Check out https://github.com/STAT545-UBC.

17. Hands-On Programming with R, by Garrett Grolemund

RStudio has made the fantastic introductory book Hands-On Programming with R by Garrett Grolemund available online for free. This book is aimed at non-programmers and provides a great introduction to the R language.

You’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools. Throughout the book, you’ll use your newfound skills to solve practical data science problems.

18. Mastering Software Development in R

Mastering Software Development in R

Mastering Software Development in R, by Roger D. Peng, Sean Kross, and Brooke Anderson is great book that teaches the basics of software development principles for building Data Science tools in R.

This book provides rigorous training in the R language and covers modern software development practices for building tools that are highly reusable, modular, and suitable for use in a team-based environment or a community of developers.

This book is about using R to develop the tools for doing data science. Whether you are on a data science team or working by yourself as part of a community of developers or data scientists, you will find this book useful as a reference for the software development process in R. Throughout, we focus on the aspects of the R language that are relevant to developing code and tools that will be used by others.

Currently, the book is available for “Free or Choose your price” at Lean Pub, https://leanpub.com/msdr and the print version will be available soon.

19. Caret Bookdown

The caret package (short for Classification And REgression Training), by RStudio’s Max Kuhn is a great R package for all things predictive modeling using R. The bookdown made by Max Kuhn is freely available online at https://topepo.github.io/caret/.

It is a fantastic resource that teaches the basics and knitty-gritty details on data splitting, pre-processing, feature selection, and model tuning for common machine learning problems in R. The book down is a free alternative to Max Kuhn’s Applied Predictive Modeling book that offers a detailed introduction to predictive modeling using Caret package.

20. Modern Statistics for Modern Biology

Modern Statistics for Modern Biology

Modern Statistics for Modern Biology, by Stanford Prof. Susan Holmes, EMBL Prof. Wolfgang Huber is a great introductory book on Modern Statistics, not just for Modern Biology. The book came out of their teaching and is made available for free online for a while. The book has 13 chapters that are accessible to beginners with a right amount of R code, theory, and great visualization with ggplot2. It covers various aspects of statistics for data science including, Mixture models, clustering, testing, dimensionality reduction techniques like PCA and SVD.


21. Tidyverse Skills for Data Science with R

Tidyverse Skills for Data Science in R
Roger Peng is at it again with a new book with the title “Tidyverse Skills for Data Science with R” and this time with Stephanie Hicks, @mirnas22 and @Shannon_E_Ellis on LeanPub. Yes, paying is optional to get the digital version of the book.

1 comment

Comments are closed.