• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Pandas DataFrame / How To Split A Column or Column Names in Pandas and Get Part of it?

How To Split A Column or Column Names in Pandas and Get Part of it?

June 15, 2018 by cmdlinetips

Often you may want to create a new variable either from column names of a pandas data frame or from one of the columns of the data frame. Just like Python, Pandas has great string manipulation abilities that lets you manipulate strings easily.

Let us see an example of using Pandas to manipulate column names and a column. Let us first load Pandas and NumPy to create a Pandas data frame.

import pandas as pd
import numpy as np

Let us also create a new small pandas data frame with five columns to work with. We can create the pandas data frame from multiple lists. Here one of the columns is sample IDs with two-part strings separated by underscore “_”. Two columns are integers and other two columns are random numbers generated by NumPy’s random module.

df = pd.DataFrame({'s_ID':['a_id','b_id','c_id','d_id'],
              'A_1': [1, 1, 3, 2],
              'A_2': [1, 2, 3, 4],
              'B_1': np.random.randn(4),
              'B_2': np.random.randn(4)})

The column names of the data frame also has two parts separated by underscore “_”.

	A_1	A_2	B_1	B_2	s_ID
0	1	1	0.310886	-1.105775	a_id
1	1	2	-0.580864	-2.012918	b_id
2	3	3	-0.153627	-0.013282	c_id
3	2	4	-1.532841	-0.757318	d_id

We can get the names of the columns as a list from pandas dataframe using

>df.columns.tolist()
['A_1', 'A_2', 'B_1', 'B_2', 's_ID']

To split the column names and get part of it, we can use Pandas “str” function. Str function in Pandas offer fast vectorized string operations for Series and Pandas. Str returns a string object.

>df.columns.str
pandas.core.strings.StringMethods at 0x113ad2780

How to Get Part of a Column Names in Pandas Data Frame?

Pandas str accessor has numerous useful methods and one of them is “split”. We can use str with split to get the first, second or nth part of the string. For example, to get the first part of the string, we will first split the string with a delimiter. Here each part of the string is separated by ““, so we can split by ““.

>df.columns.str.split('_')
Index([['A', '1'], ['A', '2'], ['B', '1'], ['B', '2'], ['s', 'ID']], dtype='object')

We can see that df.columns.str.split(“_”) returns a list of lists as Pandas series object. We can easily get the first part of it using the str function again.

>df.columns.str.split('_').str[0]
Index(['A', 'A', 'B', 'B', 's'], dtype='object')

If one is interested in other parts of a string, we can specify the index as argument to str function. For example, to get the second part of the string after the underscore in the column names.

>df.columns.str.split('_').str[1]
Index(['1', '2', '1', '2', 'ID'], dtype='object')

Note that the result using str function to get the part of a string in vectorized fashion in Pandas results in Pandas Series object. We can easily convert to a list using tolist() function.

>df.columns.str.split('_').str[0].tolist()
['A', 'A', 'B', 'B', 's']

How to Get Part of a Column in Pandas Data Frame?

Here we just saw an example on how to get part of column names of a data frame. We can also easily get a part of any column in a Pandas data frame. For example, to get the first part of the column ‘s_ID’ in our pandas data frame, we can use

df.s_ID.str.split("_").str[0].tolist()
['a', 'b', 'c', 'd']


Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Split a Single Column in Pandas into Multiple Columns Change Column Names and Row Indexes in PandasHow To Change Column Names and Row Indexes in Pandas? Default ThumbnailHow to Get Column Names as List in Pandas? Default ThumbnailHow To Change Pandas Column Names to Lower Case

Filed Under: Pandas DataFrame, Python Tips Tagged With: get part of column in Pandas, Pandas Data Frame, Split Column Names of Pandas Dataframe

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version