9 Ways To Create New Variables with tidyverse

Add New Variables With tidyverse
When one wants to create a new variable in R using tidyverse, dplyr’s mutate verb is probably the easiest one that comes to mind that lets you create a new column or new variable easily on the fly. It is probably the go to command for every time one needed to make new variable for many people.

However, dplyr’s mutate is not the only way to create new variable. Tidyverse has a host of useful commands that can be extremely useful for create new variables in different scenarios.

In this post, we will see examples of 9 ways to create new variables with tidyverse.

Let us load tidyverse packages and gapminder package. We will use the gapminder data frame from the gapminder data frame.

library(tidyverse)
library(gapminder)

Let us filter gapminder dataframe so that we have just three columns/variables and just 4 rows of data.

gapminder <- gapminder %>%
  select(country,year,pop) %>%
  head(n=4)

1. Mutate

With the easy to use mutate verb, one can create a new variable, in this example, pop_in_mill from “pop” as follows. You can see that the resulting data frame has the new column “pop_in_mill”.

gapminder %>%
  mutate(pop_in_mill= pop/1e06)

country year pop pop_in_mill
<fctr> <int> <dbl> <dbl>
Afghanistan	1952	8425333	8.425333	
Afghanistan	1957	9240934	9.240934	
Afghanistan	1962	10267083	10.267083	
Afghanistan	1967	11537966	11.537966	

2. transmute

Sometimes, one may want to create a new variable, but not interested in the original variables that are present in the data frame. In those cases, relatively unknown tidyverse verb transmute is very useful. In this example, we create a new variable “pop_in_mill” with transmute. Note that the resulting data contains only the new variable, nothing else.

gapminder %>% 
  transmute(pop_in_mill=pop/1e06)

pop_in_mill
<dbl>
8.425333				
9.240934				
10.267083				
11.537966	

3. mutate_at

dplyr also has mutate_at verb that can very useful to make changes at a specific column in a data frame. In this simple example illustrating mutate_at, we specify the column we want to change and a function for how to change the variable. Note that it does create a name, instead new column with the same name.

gapminder %>% 
  mutate_at(c("pop"), function(x){x/1e6})

country year pop
<fctr> <int> <dbl>
Afghanistan	1952	8.425333		
Afghanistan	1957	9.240934		
Afghanistan	1962	10.267083		
Afghanistan	1967	11.537966	

The verb mutate_at can be extremely useful in the scenarios where you want to change multiple columns with some sort of pattern in their names with a certain rule.

4. mutate_if

The mutate_if is a very useful verb when one is interested in checking a condition and change the column if the condition is met. In the dummy example below, we use mutate_if to check if a column is of integer type and change it to character type.

Note that now the resulting data frame does not have any column with integer as type.

gapminder %>%
  mutate_if(is.integer, as.character)


country year pop
<fctr> <chr> <dbl>
Afghanistan	1952	8425333		
Afghanistan	1957	9240934		
Afghanistan	1962	10267083		
Afghanistan	1967	11537966	

5. mutate_all

mutate_all is another useful verb that can used to change every column. In the below example, we change the type of every column to character, regardless of their initial type.

gapminder %>%
  mutate_all(funs(as.character))

country year pop
<chr> <chr> <chr>
Afghanistan	1952	8425333		
Afghanistan	1957	9240934		
Afghanistan	1962	10267083		
Afghanistan	1967	11537966

6. add_column

The latest versions of tibble has a very convenient function called add_column() that helps adding a new column quickly on the fly. The add_column() function will not change the existing data and also one can not overwrite existing columnn.

gapminder %>% 
  add_column(id=1:4)
 
country year pop id
<fctr> <int> <dbl> <int>
1	Afghanistan	1952	8425333	1
2	Afghanistan	1957	9240934	2
3	Afghanistan	1962	10267083	3
4	Afghanistan	1967	11537966	4

The add_column() fucntion also has the arguments before and after. One can use them to specify where the new column should be.

7. add_count

add_count() is a very convenient function that helps quickly count based a variable. For example, if we want to add column specifying the number of country entries for each value of the “country” variable, we can use add_count(country) as shown below. The add_count() function will groub_by each country and get a tally count. This count will be added as new column with name “n”.

gapminder %>% 
  add_count(country)

country  year . pop . n
<fctr> . <int> . <dbl> . <int>
Afghanistan	1952	8425333	4	
Afghanistan	1957	9240934	4	
Afghanistan	1962	10267083	4	
Afghanistan	1967	11537966	4	

8. add_tally

The function add_tally() adds a column n to a table based on the number of items within each existing group.

gapminder %>% 
  add_tally()
country  year . pop . n
<fctr> . <int> . <dbl> . <int>
Afghanistan	1952	8425333	4	
Afghanistan	1957	9240934	4	
Afghanistan	1962	10267083	4	
Afghanistan	1967	11537966	4

9. rename

Often we would like rename a column. This is not adding new column per say, but a old column gets renamed. The rename function is very handy to make such column name changes.

One specifies the new column as an argument to rename function with the old name as follows. Here “population” is the new name and “pop” is the old column in the data frame.

gapminder %>% 
  rename(population=pop)

country . year . population
<fctr> . <int> . <dbl>
1	Afghanistan	1952	8425333	
2	Afghanistan	1957	9240934	
3	Afghanistan	1962	10267083	
4	Afghanistan	1967	11537966