Cleaning up the column names of a dataframe often can save a lot of headaches while doing data analysis. In this post, we will learn how to change column names of a Pandas dataframe to lower case. And then we will do additional clean up of columns and see how to remove empty spaces around column names.
Let us load Pandas and scipy.stats.
import pandas as pd from scipy.stats import poisson
We will create a toy dataframe with three columns. We will first name the dataframe’s columns with upper cases.
c1= poisson.rvs(mu=10, size=5) c2= poisson.rvs(mu=15, size=5) c3= poisson.rvs(mu=20, size=5) df=pd.DataFrame({"COLUMN1":c1, "COLUMN2":c2, "COLUMN3":c3})
Our data frame’s column names starts with uppercase.
df.head() COLUMN1 COLUMN2 COLUMN3 0 16 12 16 1 12 14 11 2 15 15 23 3 8 14 24 4 11 15 32
How To Convert Pandas Column Names to lowercase?
We can convert the names into lower case using Pandas’ str.lower() function. We first take the column names and convert it to lower case.
And then rename the Pandas columns using the lowercase names. Now our dataframe’s names are all in lower case.
df.columns= df.columns.str.lower() df.columns Index(['column1', 'column2', 'column3'], dtype='object')
Now our dataframe will look like this with lower case column names
df.head() column1 column2 column3 0 16 12 16 1 12 14 11 2 15 15 23 3 8 14 24 4 11 15 32
Cleaning up Pandas Column Names
In addition to converting to lowercases, we may want to clean up the names by removing any leading and trailing empty spaces in the column names.
Let us create a toy dataframe with column names having leading/trailing spaces.
df=pd.DataFrame({" C1 ":c1, "C2":c2, "C3 ":c3})
By inspecting column names we can see the spaces.
df.columns Index([' C1 ', 'C2', 'C3 '], dtype='object')
Note the empty space in the first column name. We can use str.strip() function Pandas to strip the leading and trailing white spaces. Here we also convert the column names into lower cases using str.lower() as before.
df.columns= df.columns.str.strip().str.lower() df.columns Index(['c1', 'c2', 'c3'], dtype='object')
We use Pandas chaining operation to do both and re-assign the cleaned column names.
df c1 c2 c3 0 16 12 16 1 12 14 11 2 15 15 23 3 8 14 24 4 11 15 32
Convert Pandas Column Names to lowercase with Pandas rename()
More compact way to change a data frame’s column names to lower case is to use Pandas rename() function. Here we specify columns argument with “str.lower” fucntion.
df.rename(columns=str.lower) c1 c2 c3 0 16 12 16 1 12 14 11 2 15 15 23 3 8 14 24 4 11 15 32
By default, Pandas’s rename function uses “inplace=False”. Therefore if we want to make changes to our dataframe we should use inplace=True or assign to a variable.
df.rename(columns=str.lower, inplace=True) column1 column2 column3 0 11 22 18 1 9 7 20 2 4 12 18 3 13 19 24 4 11 14 13