There are multiple ways read text files in rectangular, like csv file, tsv file or text file with common delimitters. readr package, part of tidyverse, offers seven functions to load flat text files easily.
How to load a text file with readr package?
- read_csv(): to read comma delimited files
- read_csv2(): to read semicolon separated files
- read_tsv(): to read tab delimited files,
- read_delim(): to read in files with any delimiter.
- read_fwf(): to read fixed width files.
- read_table(): to read a common variation of fixed width files where columns are separated by white space.
- read_log(): to read Apache style log files.
It does look daunting at first to see these many options :-), Fear not, all of these functions to read text file have similar syntax. So, if you learn one, you have learned them all. Let us see examples of using read_csv().
How to load a csv file with read_csv()?
The most simple way to read a csv file is use read_csv with the csv file as argument. readr takes care of everything and will tell you how it parses the csv file by telling you the columns and their types.
>gapminder = read_csv("data/gapminder.csv") Parsed with column specification: cols( country = col_character(), year = col_integer(), pop = col_double(), continent = col_character(), lifeExp = col_double(), gdpPercap = col_double() )
All these functions to read text loads of useful options. For example, here are all the options available for read_csv().
>help(read.csv) read_csv(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = show_progress()) ...
By default, read_csv() assumes the first line is column names and we can change that with
read_csv(file, col_names=FALSE)
How to Skip Lines while reading a CSV file in R?
Often the text files have meta data in the initial few lines. We can skip the meta data, by specifyng the number of lines we want to skip by using “skip = 0”. For example, to skip 4 lines while reading the csv file we can use
read_csv(file, skip = 4)
How to Skip Comment Lines while reading a CSV file in R?
Also, we can specify which lines are comment and not read them by specifying “comment” argument. For example, if the text file has comment with “#”, then we can read it as
read_csv(file, comment = "#")
How to Read Compressed Text/CSV files in R?
Another best thing about the readr package’s read text file functions is that, they all can recognize if the text file is compressed as gz, bz2, xz, or zip will be automatically uncompressed. You don’t have to do anything. Isn’t that cool?. read_csv() (and others) can automatically unzip it for you and load the unzipped file in R.
read_csv(file.gz)
And guess what, any file stored on the web with the link starting with http://, https://, ftp://, or ftps:// will be automatically downloaded.
How To Trim Leading and Trailing Whitespace in loading csv file?
One of the biggest challenges in data wrangling in data science is that leading and trailing white spaces present in the data. readr package’s read_csv() and other functions have option “trim_ws”, when set to TRUE all the leading and trailing whitespace around each field will be trimmed.
read_csv(file, trim_ws=TRUE)