One of the most common operations one might do while cleaning the data or doing exploratory data analysis in doing data science is manipulating/fixing the column names or row names.
In this post, we will see
- How to rename columns of pandas dataframe?
- How to change row names or row indexes of a pandas dataframe?
Let us first load pandas .
# import pandas >import pandas as pd
Let us use gapminder data from software carpentry website.
# link to gapminder data data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe >gapminder = pd.read_csv(data_url)
let us check the names of the columns of the dataframe, the first three rows of the data, using head function.
>print(gapminder.head(3)) country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333 Asia 28.801 779.445314 1 Afghanistan 1957 9240934 Asia 30.332 820.853030 2 Afghanistan 1962 10267083 Asia 31.997 853.100710
We can also use columns function to get the column names.
>gapminder.columns Index(['country', 'year', 'pop', 'continent', 'lifeExp', 'gdpPercap'], dtype='object')
1. How to Rename Columns in Pandas?
One can change the column names of a pandas dataframe in at least two ways. One way to rename columns in Pandas is to use df.columns from Pandas and assign new names directly.
For example, if you have the names of columns in a list, you can assign the list to column names directly.
To change the columns of gapminder dataframe, we can assign the list of new column names to gapminder.columns as
>gapminder.columns = ['country','year','population', 'continent','life_exp','gdp_per_cap']
This will assign the names in the list as column names for the data frame “gapminder”. We can check the dataframe to see that if it has new column names using head() function.
>gapminder.head(3) country year population continent life_exp gdp_per_cap 0 Afghanistan 1952 8425333 Asia 28.801 779.445314 1 Afghanistan 1957 9240934 Asia 30.332 820.853030 2 Afghanistan 1962 10267083 Asia 31.997 853.100710
A problem with this approach to change column names is that one has to change names of all the columns in the data frame. This approach would not work, if we want to change just change the name of one column.
2. Pandas rename function to Rename Columns
Another way to change column names in pandas is to use rename function. Using rename to change column names is a much better way than before. One can change names of specific column easily. And not all the column names need to be changed.
To change column names using rename function in Pandas, one needs to specify a mapper, a dictionary with old name as keys and new name as values. Here is an example to change many column names using a dictionary. We will also use inplace=True to change column names in place.
>gapminder.rename(columns={'pop':'population', 'lifeExp':'life_exp', 'gdpPercap':'gdp_per_cap'}, inplace=True) >print(gapminder.columns) Index([u'country', u'year', u'population', u'continent', u'life_exp', u'gdp_per_cap'], dtype='object') >gapminder.head(3) country year population continent life_exp gdp_per_cap 0 Afghanistan 1952 8425333 Asia 28.801 779.445314 1 Afghanistan 1957 9240934 Asia 30.332 820.853030 2 Afghanistan 1962 10267083 Asia 31.997 853.100710
One of the biggest advantages of using rename function is that we can use rename to change as many column names as we want.
Let us change the name of a single column.
>gapminder.rename(columns={'pop':'population'}, inplace=True) >print(gapminder.columns) Index([u'country', u'year', u'population', u'continent', u'lifeExp', u'gdpPercap'], dtype='object') >gapminder.head(3) country year population continent lifeExp gdpPercap 0 Afghanistan 1952 8425333 Asia 28.801 779.445314 1 Afghanistan 1957 9240934 Asia 30.332 820.853030 2 Afghanistan 1962 10267083 Asia 31.997 853.100710
Pandas rename function can also take a function as input instead of a dictionary. For example, we can write a lambda function to take the current column names and consider only the first three characters for the new column names.
>gapminder.rename(columns=lambda x: x[0:3], inplace=True) >gapminder.head(3) coun year pop cont life gdpP 0 Afghanistan 1952 8425333 Asia 28.801 779.445314 1 Afghanistan 1957 9240934 Asia 30.332 820.853030 2 Afghanistan 1962 10267083 Asia 31.997 853.100710
How To Change and Row Names/Indexes in Pandas?
Another good thing about pandas rename function is that, we can also use it to change row indexes or row names.
We just need to use index argument and specify, we want to change index not columns.
For example, to change row names 0 and 1 to ‘zero’ and ‘one’ in our gapminder dataframe, we will construct a dictionary with old row index names as keys and new row index as values.
>gapminder.rename(index={0:'zero',1:'one'}, inplace=True) >print(gapminder.head(4)) country year pop continent lifeExp gdpPercap zero Afghanistan 1952 8425333 Asia 28.801 779.445314 one Afghanistan 1957 9240934 Asia 30.332 820.853030 2 Afghanistan 1962 10267083 Asia 31.997 853.100710 3 Afghanistan 1967 11537966 Asia 34.020 836.197138
We can see that just first two rows have new names as we intended.
How To Change Column Names and Row Indexes Simultaneously in Pandas?
With pandas’ rename function, one can also change both column names and row names simultaneously by using both column and index arguments to rename function with corresponding mapper dictionaries.
Let us change the column name “lifeExp” to “life_exp” and also row indices “0 & 1” to “zero and one”.
>gapminder.rename(columns={'lifeExp':'life_exp'}, index={0:'zero',1:'one'}, inplace=True) >print(gapminder.head(4)) country year pop continent life_exp gdpPercap zero Afghanistan 1952 8425333 Asia 28.801 779.445314 one Afghanistan 1957 9240934 Asia 30.332 820.853030 2 Afghanistan 1962 10267083 Asia 31.997 853.100710 3 Afghanistan 1967 11537966 Asia 34.020 836.197138
Are you new to Pandas? And getting started with Pandas recently? Check out our new Byte Sized Pandas 101 tutorials.