When one wants to create a new variable in R using tidyverse, dplyr’s mutate verb is probably the easiest one that comes to mind that lets you create a new column or new variable easily on the fly. It is probably the go to command for every time one needed to make new variable for many people.
However, dplyr’s mutate is not the only way to create new variable. Tidyverse has a host of useful commands that can be extremely useful for create new variables in different scenarios.
In this post, we will see examples of 9 ways to create new variables with tidyverse.
Let us load tidyverse packages and gapminder package. We will use the gapminder data frame from the gapminder data frame.
library(tidyverse) library(gapminder)
Let us filter gapminder dataframe so that we have just three columns/variables and just 4 rows of data.
gapminder <- gapminder %>% select(country,year,pop) %>% head(n=4)
1. Mutate
With the easy to use mutate verb, one can create a new variable, in this example, pop_in_mill from “pop” as follows. You can see that the resulting data frame has the new column “pop_in_mill”.
gapminder %>% mutate(pop_in_mill= pop/1e06) country year pop pop_in_mill <fctr> <int> <dbl> <dbl> Afghanistan 1952 8425333 8.425333 Afghanistan 1957 9240934 9.240934 Afghanistan 1962 10267083 10.267083 Afghanistan 1967 11537966 11.537966
2. transmute
Sometimes, one may want to create a new variable, but not interested in the original variables that are present in the data frame. In those cases, relatively unknown tidyverse verb transmute is very useful. In this example, we create a new variable “pop_in_mill” with transmute. Note that the resulting data contains only the new variable, nothing else.
gapminder %>% transmute(pop_in_mill=pop/1e06) pop_in_mill <dbl> 8.425333 9.240934 10.267083 11.537966
3. mutate_at
dplyr also has mutate_at verb that can very useful to make changes at a specific column in a data frame. In this simple example illustrating mutate_at, we specify the column we want to change and a function for how to change the variable. Note that it does create a name, instead new column with the same name.
gapminder %>% mutate_at(c("pop"), function(x){x/1e6}) country year pop <fctr> <int> <dbl> Afghanistan 1952 8.425333 Afghanistan 1957 9.240934 Afghanistan 1962 10.267083 Afghanistan 1967 11.537966
The verb mutate_at can be extremely useful in the scenarios where you want to change multiple columns with some sort of pattern in their names with a certain rule.
4. mutate_if
The mutate_if is a very useful verb when one is interested in checking a condition and change the column if the condition is met. In the dummy example below, we use mutate_if to check if a column is of integer type and change it to character type.
Note that now the resulting data frame does not have any column with integer as type.
gapminder %>% mutate_if(is.integer, as.character) country year pop <fctr> <chr> <dbl> Afghanistan 1952 8425333 Afghanistan 1957 9240934 Afghanistan 1962 10267083 Afghanistan 1967 11537966
5. mutate_all
mutate_all is another useful verb that can used to change every column. In the below example, we change the type of every column to character, regardless of their initial type.
gapminder %>% mutate_all(funs(as.character)) country year pop <chr> <chr> <chr> Afghanistan 1952 8425333 Afghanistan 1957 9240934 Afghanistan 1962 10267083 Afghanistan 1967 11537966
6. add_column
The latest versions of tibble has a very convenient function called add_column() that helps adding a new column quickly on the fly. The add_column() function will not change the existing data and also one can not overwrite existing columnn.
gapminder %>% add_column(id=1:4) country year pop id <fctr> <int> <dbl> <int> 1 Afghanistan 1952 8425333 1 2 Afghanistan 1957 9240934 2 3 Afghanistan 1962 10267083 3 4 Afghanistan 1967 11537966 4
The add_column() fucntion also has the arguments before and after. One can use them to specify where the new column should be.
7. add_count
add_count() is a very convenient function that helps quickly count based a variable. For example, if we want to add column specifying the number of country entries for each value of the “country” variable, we can use add_count(country) as shown below. The add_count() function will groub_by each country and get a tally count. This count will be added as new column with name “n”.
gapminder %>% add_count(country) country year . pop . n <fctr> . <int> . <dbl> . <int> Afghanistan 1952 8425333 4 Afghanistan 1957 9240934 4 Afghanistan 1962 10267083 4 Afghanistan 1967 11537966 4
8. add_tally
The function add_tally() adds a column n to a table based on the number of items within each existing group.
gapminder %>% add_tally() country year . pop . n <fctr> . <int> . <dbl> . <int> Afghanistan 1952 8425333 4 Afghanistan 1957 9240934 4 Afghanistan 1962 10267083 4 Afghanistan 1967 11537966 4
9. rename
Often we would like rename a column. This is not adding new column per say, but a old column gets renamed. The rename function is very handy to make such column name changes.
One specifies the new column as an argument to rename function with the old name as follows. Here “population” is the new name and “pop” is the old column in the data frame.
gapminder %>% rename(population=pop) country . year . population <fctr> . <int> . <dbl> 1 Afghanistan 1952 8425333 2 Afghanistan 1957 9240934 3 Afghanistan 1962 10267083 4 Afghanistan 1967 11537966