• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / tidyverse / Collapse columns in tidyverse / How To Collapse Multiple Text Columns in Dataframe Using Tidyverse?

How To Collapse Multiple Text Columns in Dataframe Using Tidyverse?

November 16, 2018 by cmdlinetips

Often you may have a data frame, where multiple columns are related and you may want to combine those related columns into a single column. In an earlier post, we saw how we can collapse a numerical data frame with related columns using Python.

In this post, we consider the problem of collapsing or combining multiple related text columns using tidyverse in R. A visual demonstration of the question showing a data frame before and after collapsing is depicted in the figure below.

How To Collapse Text Columns Tidyverse?
How To Collapse Text Columns Tidyverse?

Let us make a toy data frame with six columns and a column for IDs. The every other column is related, for example the columns S1_1 and S1_2 are related and S2_1 and S2_2 are related. Notice that each column has text/string. We would like to collapse or combine two similar columns. For example, we would like to collapse ‘A’ and ‘T’ in columns s_A and s_B into ‘AT’.

# load tidyverse
library(tidyverse)
# create row/sample IDs 
row_id <- c("s_A", "s_B", "s_C", "s_D", "s_E")
# create column variables
S1_1 <- c('A', 'T', 'G', 'C', 'A')
S1_2 <- c('G', 'C', 'A', 'T', 'A')
S2_1 <- c('A', 'T', 'G', 'C', 'A')
S2_2 <- c('G', 'T', 'G', 'T', 'A')
S3_1 <- c('G', 'C', 'A', 'T', 'A')
S3_2 <- c('A', 'C', 'G', 'C', 'A')

We have all the variables needed to create a dataframe. Now we can use data.frame function and
make a new data frame df.

# create data frame 
df <- data.frame(row_id, S1_1, S1_2, S2_1, S2_2, S3_1, S3_2,
                 stringsAsFactors=FALSE)
# view the data frame
df 

Let us use tidyverse, mainly functions from the packages tidyr and dplyr to collapse/combine multiple columns. At first we need to make our data frame tidy. When the data is “tidy”,

  • Each variable is in a column
  • Each observation is a row.
  • Each value is a cell.

We will start with the dataframe df and use magritter or pipe to give the dataframe content to tidyr’s gather function to make the data long from the wide form.

df %>%
  gather("col_ID", "Value",-row_id) %>%
  head(n=3)

Once we have the data in the long form, we can select certain columns and rename it for convenience. Here, we identify the group IDs of columns that are related using separate function in tidyr. We are splitting the column IDs by under score and keeping the ID representing the group.

df %>%
  gather("col_ID", "Value",-row_id) %>%
  separate(col_ID,c("col_group")) %>%
  head(n=3)

Now we can use group_by function on the group IDs, so that we combine with in each group.

df %>%
  gather("col_ID", "Value",-row_id) %>%
  separate(col_ID,c("col_group")) %>%
  group_by(row_id, col_group) %>%
  head(n=3)

Here we use summarize function to combine values from related columns using paste0.

df %>%
  gather("col_ID", "Value",-row_id) %>%
  separate(col_ID,c("col_group")) %>%
  group_by(row_id, col_group) %>%
  summarize(new_value=paste0(Value,collapse = "")) %>%
  head

Now we have the output we needed, but in long form, so we can use spread function from tidyr to reshape our data frame into wide form.

df %>%
  gather("col_ID", "Value",-row_id) %>%
  separate(col_ID,c("col_group")) %>%
  group_by(row_id, col_group) %>%
  summarize(new_value=paste0(Value,collapse = "")) %>%
  spread(col_group,new_value) %>%
  head

Voila.. we have the data frame we wanted! Tune in for another example with collapsing a data frame with numerical columns using tidyverse.


Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Split Text in a Column in Data Frame in R? Default ThumbnailHow to Collapse Multiple Columns in Pandas? Groupby with Dictionary Default ThumbnailHow To Drop Multiple Columns in Pandas Dataframe? Default ThumbnailHow to Fold a Long Single Line Text to Multiple Lines (in Python and Linux)?

Filed Under: Collapse columns in tidyverse, R Tips Tagged With: collapse columns in tidyverse, combine text columns

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version