Getting a quick look at the dataframe to understand the variables we have or data types is an important aspect of data analysis. If you are used to working with excel, your first impulse is to open the data in excel. However, getting a look at the data programmatically in R has many advantages including the safety of not changing data file by mistake.
In this post, we will see three ways to get a peek at the data in a dataframe in R. We will first use tidyverse’s glimpse() function to get a glimpse of a dataframe, then see how to get look at the top or bottom few rows of the data frame and finally see how to get a look at the data with view() function in R.
Let us first load tidyverse suite of R packages.
library("tidyverse")
We will use the fantastic Penguins dataset to illustrate the three ways to see data in a dataframe. Let us load the data from cmdlinetips.com’ github page.
path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv"
penguins<- readr::read_csv(path2data)
## Parsed with column specification: ## cols( ## species = col_character(), ## island = col_character(), ## bill_length_mm = col_double(), ## bill_depth_mm = col_double(), ## flipper_length_mm = col_double(), ## body_mass_g = col_double(), ## sex = col_character() ## )
We will three different ways to get a quick look at a data frame in R.
1. glimpse(): Get a glimpse of the data and datatype
glimpse() function in tidyverse is from tibble package and is great to view the columns/variables in a dataframe, It also shows data type and some of the data in the dataframe in each row.
glimpse(penguins)
Here is the output of glimpse() function. It starts off with the number of rows and columns and each column in separate rows.
## Rows: 344 ## Columns: 7 ## $ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "… ## $ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen",… ## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,… ## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,… ## $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18… ## $ body_mass_g <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,… ## $ sex <chr> "male", "female", "female", NA, "female", "male", "…
2. head(): to see the first n elements of data frame
head() function lets you get a look at top n rows of a dataframe. By default it shows the first 6 rows in a dataframe.
head(penguins) ## # A tibble: 6 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## 4 Adelie Torge… NA NA NA NA <NA> ## 5 Adelie Torge… 36.7 19.3 193 3450 fema… ## 6 Adelie Torge… 39.3 20.6 190 3650 male
We can specify the number of rows we want to see in a dataframe with the argument “n”. In the example below, we use n=3 to look at the first three rows of a data frame.
head(penguins, n=3) ## # A tibble: 3 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema…
2. tail(): See the last n elements of data frame
The function tail() is counterpart to head(). tail() lets you to take a look at the bottom n rows of a dataframe. We can adjust the number of rows with the argument “n” as with head() function.
tail(penguins) ## # A tibble: 6 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Chinst… Dream 45.7 17 195 3650 fema… ## 2 Chinst… Dream 55.8 19.8 207 4000 male ## 3 Chinst… Dream 43.5 18.1 202 3400 fema… ## 4 Chinst… Dream 49.6 18.2 193 3775 male ## 5 Chinst… Dream 50.8 19 210 4100 male ## 6 Chinst… Dream 50.2 18.7 198 3775 fema…
3. view(): View the data as table in RStudio
The third way to get a look at the data in dataframe is to use view() function. In RStudio, view() function opens the dataframe in a separate window in the source panel.
view(penguins)
It displays the data in a nice tabular form with ability to sort columns. It is kind of looking at the data in a excel file but with read only mode.
This post is part of the series of posts covering tidyverse tips, tricks, and tutorials to learn data analysis, data munging skills in R with tidyverse suite of R packages. Check here for more tidyverse 101 posts.
[…] In this tutorial, we will see examples of using count() function from dplyr to explore variables in a dataframe. One of the first things to do after loading a data is to perform simple exploratory data analysis. One typically starts data exploration with a quick look at the data with functions like glimpse() or head(). […]