How to Select Numerical Columns from a Pandas Dataframe

In this post, we will learn how to use Pandas to select columns based on their datatypes. For example, if we have Pandas dataframe with multiple data types, like numeric and object and we will learn how to select columns that are numeric.

We can use Pandas’ seclect_dtypes() function and specify which data type to include or exclude. This will allow us to select/ ignore columns by their data types.

Let us load Pandas and check its version.

import numpy as np
import pandas as pd
pd.__version__
1.0.0

We will use College Tuition data set from tidytuesday project. We will load the data reading dorectly from tidytuesday’s website.

data_url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/tuition_cost.csv"
df = pd.read_csv(data_url)
df.iloc[0:5,0:3]

It contains columns corresponding to multiple datatypes. Here is the first few rows of the data frame.

name	state	state_code
0	Aaniiih Nakoda College	Montana	MT
1	Abilene Christian University	Texas	TX
2	Abraham Baldwin Agricultural College	Georgia	GA
3	Academy College	Minnesota	MN
4	Academy of Art University	California	CA

Select Columns that are numeric from Pandas dataframe

If we want to select columns that are integers or doubles (anything numneric), we can use include argument to select_dtypes() function and specify include=’number’ as shown below.


df.select_dtypes(include='number').head()

This excludes any non-numeric columns and gives us only the columns that are numeric.

  room_and_board in_state_tuition in_state_total out_of_state_tuition out_of_state_total
0	NaN	2380	2380	2380	2380
1	10350.0	34850	45200	34850	45200
2	8474.0	4128	12602	12550	21024
3	NaN	17661	17661	17661	17661
4	16648.0	27810	44458	27810	44458

Select Columns that are Non-numeric from Pandas dataframe

Similarly if we wanted to select columns that are non-numeric, i.e. “object”, we can use select_dtype() function with include=’object’.


df.select_dtypes(include='object').head()

And we get columns that are of type “object”.

   name	state	state_code  type  degree_length
0	Aaniiih Nakoda College	Montana	MT	Public	2 Year
1	Abilene Christian University	Texas	TX	Private	4 Year
2	Abraham Baldwin Agricultural College	Georgia	GA	Public	2 Year
3	Academy College	Minnesota	MN	For Profit	2 Year
4	Academy of Art University	California	CA	For Profit	4 Year

How to exclude columns of certain datatypes from Pandas dataframe

We can get the same behaviour from select_dtypes() function, but using the argument exclude instead of include.

For example, to select columns that are non-object, we can use select_dtypes() with exclude=’object’.



df.select_dtypes(exclude='object').head()

In this case this gives us numerical columns by excluding columns that are of type object.

room_and_board	in_state_tuition in_state_total	out_of_state_tuition	out_of_state_total
0	NaN	2380	2380	2380	2380
1	10350.0	34850	45200	34850	45200
2	8474.0	4128	12602	12550	21024
3	NaN	17661	17661	17661	17661
4	16648.0	27810	44458	27810	44458

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.