13 Best Free Online Resources/Books to learn R and Data Science

R Books/Resources for Data Science

R Books/Resources for Data Science

If you are interested in learning R and Data Science, but not interested in spending money on books, you are definitely in good space. There are a number of fantastic books and resources available online for free from top most creators and scientists.

Here are such 13 best free (so far) online books and resources for learning R and Data Science from people like Hadley Wickham, Winston Chang, Garrett Grolemund and JHU Professor Roger Peng.

R for Data Science

R for Data Science

R for Data Science

R for Data Science, by Hadley Wickham and Garrett Grolemund, is a great book that introduces R programming, RStudio- the free and open-source integrated development environment for R, and the tidyverse, a suite of R packages designed by Wickham “to work together to make data science fast, fluent, and fun”. Hadley Wickham wrote the book online and is available for free online at http://r4ds.had.co.nz/. The physical copy of R for Data Science is priced at $40.00 without any discount.

Advanced R by Hadley Wickham

Advanced R by Hadley Wickham

Advanced R

Advanced R is another gem by Hadley Wickham, aimed at intermediate and advanced R users. In addition to teaching fundamentals of R and the data types, the book shows how functional programming can be used to solve a wide range of problems.

You know R is not the fastest language, but if you are interested in making your R code faster and memory efficient code, this is the book that you want. Free online version of the book is available at http://adv-r.had.co.nz/, and a physical copy costs you around $35 in Amazon.

R Packages by Hadley Wickham

How to create a R Package

R Packages

Hadley Wickham has made yet another book available for free and this is on how to create your own R packages. Forgot where I read, it went something like this. If you are doing the same thing while programming for three times, you write a function. And if you write three functions to do something, it is time to create a package.  Hadley Wickham summarizes the book beautifully

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. In this section you’ll learn how to turn your code into packages that others can easily download and use. Writing a package can seem overwhelming at first. So start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better.

Get started packaging your functions with R Packages book available freely at http://r-pkgs.had.co.nz/.


ggplot2 by Hadley Wickham

ggplot2

ggplot2

Once you get a hang of ggplot, ggplot2’s reference website is a fantastic resource for using ggplot2 an customizing your plot the way you want. This website has all commands of ggplot. And each reference page has all the available options for the ggplot command and then easy to understand code chunk showing how to use the command to make a plot.

The original ggplot book is about 10 years old and not available freely. Hadley Wickham has been working on the new ggplot2 book and it is available online free. It is available at https://github.com/hadley/ggplot2-book.

Cookbook for R, Winston Chang

Winston Chang from R Studio, has a great book on all things graphics with R, “R Graphics Cookbook”. Unfortunately, that book is not free online. Winston Chang has made a great resource website, http://www.cookbook-r.com/, available for free. It is a fantastic resource for any getting started with plotting with ggplot and more. It has a lots of code chunks answering common questions that arise while learning graphics with R.

Data Visualization: A practical introduction by Kieran Healy

Data Visualization, A practical introduction

Data Visualization

Data Visualization: A practical introduction by Duke University professor Kieran Healy,  is a new book that is not available in print yet. However, the draft (reaaaallly good one) version is available online for free at http://socviz.co.

The book does not assume prior knowledge of R and offers a hands-on introduction to visualizing using R and Hadley Wickham’s ggplot. One of the good things about this book is it has a series of worked examples that helps you build plots piece by piece, from simple scatterplots to more complex graphics. If you love playing with geo-spatial data and want to make awesome maps, this book has a whole chapter devoted to visualizing geographical data with R.

R Programming for Data Science, Roger D Peng

R Programming for Data Science

R Programming for Data Science

R Programming for Data Science is a a great book from Roger D Peng, JHU professor with materials from his Johns Hopkins Data Science Specialization course. The book is available online at leanpub, where you can fix your own price to buy this book, from 0 dollars to anything you wish. The book is also available in print through Lulu.

The book covers the basics of R programming needed for doing data science with R and interesting topics that you may not see else where, like regular expressions, debugging, parallel computing, and R profiling.

Exploratory Data Analysis with R, Roger D Peng

Exploratory Data Analysis with R

Exploratory Data Analysis with R

It is the awesome Roger Peng again, and this time the book is all about Exploratory Data Analysis using R. This book is also based on courses  from Johns Hopkins Data Science Specialization and available from https://leanpub.com/exdata for a price that you are willing to pay (zero to anything).

In addition to covering the basics of exploratory analysis, the book also covers topics needed for analyzing and visualizing high-dimensional or multidimensional data, like Hierarchical Clustering, K-means clustering, and dimensionality reduction techniques- SVD and PCA.

In addition to these books, Roger Peng and Dr. Hilary Parker, a Data Scientist at Stitch Fix have a wonderful podcast “Not So Standard Deviations“. This podcast is a must, if you are interested in pursuing Data Science as a career. In this podcast, they discuss all about  data science, common issues and problems in analyzing data. The podcast offers a really a nice perspective of data science as a field in both academia and industries.

An Introduction to Statistical and Data Sciences via R,  Chester Ismay and Albert Y. Kim

An Introduction to Statistical and Data Sciences via R

An Introduction to Statistical and Data Sciences via R

It is online only book developed by Chester Ismay and Albert Y. Kim and covers the basics of using R to analyze data and create data stories using R and statistics.  Check the samples of “data stories” from one of the classes here from one of the authors of the book. This book was written using RStudio’s bookdown package and available at ModernDive.com.


Software Carpentry

Software Carpentry

Software Carpentry

Sofware carpentry is a volunteer run non-profit organization with the goal to teach basic computing skills for researchers. It has hundreds of volunteers around the world, teaching two-day workshops for beginners on a variety of computing topics. They have troves of open -source lesson materials polished by these volunteer instructors. Software Carpentry has two workshop lessons teaching R to people with no prior programming experience.

  • Programming with R

    • Programming with R lessons teaches the basics of computaing language and the basics of data analysis using a simple data set. Not just that, it also teaches you how make dynamic documents with R Markdown using kinitr and how you can create R packages.
  • R for Reproducible Scientific Analysis

    • R for Reproducible Scientific Analysis teaches basics of R for beginners with the rich gapminder data set, a real world data of countries over a long time period. This workshop lessons cover data structures in R, data frame manipulation with dplyr and tidyr and making reproducible markdown documents with Knitr.

Why wait, just look here to find if there is any nearby two day workshops (mostly free) from Software Carpentry.

Text Mining with R: A Tidy Approach by Julia Silge and David Robinson

Text Mining with R

Text Mining with R

Text mining with R is a great introductory book for learning to mine text data with R. What is better is that it uses the principles of tidy data and thus lets you practice tidyverse principles in text datasets. It has loads of examples of using R and tidyverse to explore literature, news, and social media data and gain meaningful insights.

If you are interested in analyzing social media data, this book is for you. It has a whole chapter on analyzing twitter data and doing sentiment analysis. The book is freely available online at https://www.tidytextmining.com/.  Physical copy of the book is also available for purchase.

Introduction to Empirical Bayes: Examples from Baseball Statistics by  David Robinson

Introduction to Empirical Bayes Introduction to Empirical Bayes

Introduction to Empirical Bayes

This empirical bayes ebook, which initially started as a series of blogposts,  introduces the empirical Bayesian approach to estimation, credible intervals, A/B testing and mixture models with R code examples using baseball batting averages.

Although the title looks like this book is for baseball aficionados, the book is a treat for anyone learning modern data science.  The statistical methods illustrated (with data and R) in the book are the same and effective in estimating click-through rates on ads, success rates of experiments, and so on.

The pdf version of this book is available at https://gumroad.com/l/empirical-bayes for name your own price starting from $0.

Data Analysis for the Life Sciences with R,  Rafael A Irizarry and Michael I Love

Data Analysis for the Life Sciences

Data Analysis for the Life Sciences

If you are interested in learning data analysis and statistical analysis with R in life sciences, the Harvard  team Irizarry and Love, has a great book in Data Analysis for the Life Sciences with R. Altough this book mainly focuses on high throughput data from genomics, the methods described in this book are ideally suited for modern data science in any domain.

The book is the result of teaching from multiple courses in the popular HarvardX. It covers wide range of topics including, Exploratory Data Analysis, Basic Statistical Models, Inference For High Dimensional Data, Dimensionality reduction and basic machine learning.

This book covers all these rich topics without getting you bogged down with the math behind them. It offers R code to solve a problem with data and helps you gain better intuition behind the math/theory. The pdf version of this book is available freely on leanpub, https://leanpub.com/dataanalysisforthelifesciences, with the option to name your price.

stat545