How To Change Column Names and Row Indexes in Pandas?

Change Column Names and Row Indexes in Pandas

Change Column Names and Row Indexes in Pandas

One of the most common operations one might do while cleaning the data or doing exploratory data analysis in doing data science is manipulating/fixing the column names or row names.

In this post, we will see

  1. How to rename columns of pandas dataframe?
  2. How to change row names or row indexes of a pandas dataframe?

Let us first load pandas .

# import pandas
>import pandas as pd

Let us use gapminder data from software carpentry website.

# link to gapminder data
data_url = 'http://bit.ly/2cLzoxH'
# read data from url as pandas dataframe
>gapminder = pd.read_csv(data_url)

let us check the names of the columns of the dataframe, the first three rows of the data, using head function.

>print(gapminder.head(3))
      country  year       pop continent  lifeExp   gdpPercap
0  Afghanistan  1952   8425333      Asia   28.801  779.445314
1  Afghanistan  1957   9240934      Asia   30.332  820.853030
2  Afghanistan  1962  10267083      Asia   31.997  853.100710

We can also use columns function to get the column names.

>gapminder.columns
Index(['country', 'year', 'pop', 'continent', 'lifeExp', 'gdpPercap'], dtype='object')

1. How to Rename Columns in Pandas?

One can change the column names of a pandas dataframe in at least two ways. One way to rename columns in Pandas is to use df.columns from Pandas and assign new names directly.

For example, if you have the names of columns in a list, you can assign the list to column names directly.

To change the columns of gapminder dataframe, we can assign the list of new column names to gapminder.columns as

>gapminder.columns = ['country','year','population',
                     'continent','life_exp','gdp_per_cap']

This will assign the names in the list as column names for the data frame “gapminder”. We can check the dataframe to see that if it has new column names using head() function.

>gapminder.head(3)
       country  year population continent life_exp gdp_per_cap
0  Afghanistan  1952   8425333      Asia   28.801  779.445314
1  Afghanistan  1957   9240934      Asia   30.332  820.853030
2  Afghanistan  1962  10267083      Asia   31.997  853.100710

A problem with this approach to change column names is that one has to change names of all the columns in the data frame. This approach would not work, if we want to change just change the name of one column.

2. Pandas rename function to Rename Columns

Another way to change column names in pandas is to use rename function. Using rename to change column names is a much better way than before. One can change names of specific column easily. And not all the column names need to be changed.

To change column names using rename function in Pandas, one needs to specify a mapper, a dictionary with old name as keys and new name as values. Here is an example to change many column names using a dictionary. We will also use inplace=True to change column names in place.

>gapminder.rename(columns={'pop':'population',
                          'lifeExp':'life_exp',
                          'gdpPercap':'gdp_per_cap'}, 
                 inplace=True)

>print(gapminder.columns)

Index([u'country', u'year', u'population', u'continent', u'life_exp',
       u'gdp_per_cap'],
      dtype='object')

>gapminder.head(3)

       country  year  population continent  life_exp  gdp_per_cap
0  Afghanistan  1952     8425333      Asia    28.801   779.445314
1  Afghanistan  1957     9240934      Asia    30.332   820.853030
2  Afghanistan  1962    10267083      Asia    31.997   853.100710

One of the biggest advantages of using rename function is that we can use rename to change as many column names as we want.

Let us change the name of a single column.

>gapminder.rename(columns={'pop':'population'}, inplace=True)

>print(gapminder.columns)
Index([u'country', u'year', u'population', u'continent', u'lifeExp',
       u'gdpPercap'],
      dtype='object')

>gapminder.head(3)
       country  year  population continent  lifeExp   gdpPercap
0  Afghanistan  1952     8425333      Asia   28.801  779.445314
1  Afghanistan  1957     9240934      Asia   30.332  820.853030
2  Afghanistan  1962    10267083      Asia   31.997  853.100710

Pandas rename function can also take a function as input instead of a dictionary. For example, we can write a lambda function to take the current column names and consider only the first three characters for the new column names.

>gapminder.rename(columns=lambda x: x[0:3], inplace=True)

>gapminder.head(3)
          coun  year       pop  cont    life        gdpP
0  Afghanistan  1952   8425333  Asia  28.801  779.445314
1  Afghanistan  1957   9240934  Asia  30.332  820.853030
2  Afghanistan  1962  10267083  Asia  31.997  853.100710 

How To Change and Row Names/Indexes in Pandas?

Another good thing about pandas rename function is that, we can also use it to change row indexes or row names.

We just need to use index argument and specify, we want to change index not columns.

For example, to change row names 0 and 1 to ‘zero’ and ‘one’ in our gapminder dataframe, we will construct a dictionary with old row index names as keys and new row index as values.

>gapminder.rename(index={0:'zero',1:'one'}, inplace=True)
>print(gapminder.head(4))

          country  year       pop continent  lifeExp   gdpPercap
zero  Afghanistan  1952   8425333      Asia   28.801  779.445314
one   Afghanistan  1957   9240934      Asia   30.332  820.853030
2     Afghanistan  1962  10267083      Asia   31.997  853.100710
3     Afghanistan  1967  11537966      Asia   34.020  836.197138

We can see that just first two rows have new names as we intended.

How To Change Column Names and Row Indexes Simultaneously in Pandas?

With pandas’ rename function, one can also change both column names and row names simultaneously by using both column and index arguments to rename function with corresponding mapper dictionaries.

Let us change the column name “lifeExp” to “life_exp” and also row indices “0 & 1” to “zero and one”.

>gapminder.rename(columns={'lifeExp':'life_exp'}, 
                 index={0:'zero',1:'one'}, 
                 inplace=True)
>print(gapminder.head(4))

          country  year       pop continent  life_exp   gdpPercap
zero  Afghanistan  1952   8425333      Asia    28.801  779.445314
one   Afghanistan  1957   9240934      Asia    30.332  820.853030
2     Afghanistan  1962  10267083      Asia    31.997  853.100710
3     Afghanistan  1967  11537966      Asia    34.020  836.197138