In this post, we will learn how to use Pandas to select columns based on their datatypes. For example, if we have Pandas dataframe with multiple data types, like numeric and object and we will learn how to select columns that are numeric.
We can use Pandas’ seclect_dtypes() function and specify which data type to include or exclude. This will allow us to select/ ignore columns by their data types.
Let us load Pandas and check its version.
import numpy as np import pandas as pd pd.__version__ 1.0.0
We will use College Tuition data set from tidytuesday project. We will load the data reading dorectly from tidytuesday’s website.
data_url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/tuition_cost.csv" df = pd.read_csv(data_url) df.iloc[0:5,0:3]
It contains columns corresponding to multiple datatypes. Here is the first few rows of the data frame.
name state state_code 0 Aaniiih Nakoda College Montana MT 1 Abilene Christian University Texas TX 2 Abraham Baldwin Agricultural College Georgia GA 3 Academy College Minnesota MN 4 Academy of Art University California CA
Select Columns that are numeric from Pandas dataframe
If we want to select columns that are integers or doubles (anything numneric), we can use include argument to select_dtypes() function and specify include=’number’ as shown below.
df.select_dtypes(include='number').head()
This excludes any non-numeric columns and gives us only the columns that are numeric.
room_and_board in_state_tuition in_state_total out_of_state_tuition out_of_state_total 0 NaN 2380 2380 2380 2380 1 10350.0 34850 45200 34850 45200 2 8474.0 4128 12602 12550 21024 3 NaN 17661 17661 17661 17661 4 16648.0 27810 44458 27810 44458
Select Columns that are Non-numeric from Pandas dataframe
Similarly if we wanted to select columns that are non-numeric, i.e. “object”, we can use select_dtype() function with include=’object’.
df.select_dtypes(include='object').head()
And we get columns that are of type “object”.
name state state_code type degree_length 0 Aaniiih Nakoda College Montana MT Public 2 Year 1 Abilene Christian University Texas TX Private 4 Year 2 Abraham Baldwin Agricultural College Georgia GA Public 2 Year 3 Academy College Minnesota MN For Profit 2 Year 4 Academy of Art University California CA For Profit 4 Year
How to exclude columns of certain datatypes from Pandas dataframe
We can get the same behaviour from select_dtypes() function, but using the argument exclude instead of include.
For example, to select columns that are non-object, we can use select_dtypes() with exclude=’object’.
df.select_dtypes(exclude='object').head()
In this case this gives us numerical columns by excluding columns that are of type object.
room_and_board in_state_tuition in_state_total out_of_state_tuition out_of_state_total 0 NaN 2380 2380 2380 2380 1 10350.0 34850 45200 34850 45200 2 8474.0 4128 12602 12550 21024 3 NaN 17661 17661 17661 17661 4 16648.0 27810 44458 27810 44458
This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.