How To Select Columns by Data Type in Pandas?

Often when you are working with bigger dataframe and doing some data cleaning or exploratory data analysis, you might want to select columns of Pandas dataframe by their data types.

For example, you might want to quickly select columns that are numerical in type and visualize their summary data. Or you might want to select columns that are categorical type and check their levels.

Let us see examples of selecting columns based on their data type. Pandas has a bit obscure, but very useful function called select_dtypes to help us select columns by their data types.

Let us load Pandas .

# import pandas
import pandas as pd

Let us load gapminder data as data frame

# gapminder data
data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
gapminder.head(n=3)
	country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710

Let us check the data types of gapminder dataframe using the Pandas method dtypes

# get datatypes of columns in the dataframe
>gapminder.dtypes
country       object
year           int64
pop          float64
continent     object
lifeExp      float64
gdpPercap    float64
dtype: object

How To Select Columns with NUmerical Data Types

Pandas select_dtypes function allows us to specify a data type and select columns matching the data type.

For example, to select columns with numerical data type, we can use select_dtypes with argument number. Now we get a new data frame with only numerical datatypes.

gapminder.select_dtypes(np.number).head()

year	pop	lifeExp	gdpPercap
0	1952	8425333.0	28.801	779.445314
1	1957	9240934.0	30.332	820.853030
2	1962	10267083.0	31.997	853.100710

We can also be more specify and select data types matching “float” or “integer”. If we want to select columns with float datatype, we use

gapminder.select_dtypes('float')

pop	lifeExp	gdpPercap
0	8425333.0	28.801	779.445314
1	9240934.0	30.332	820.853030
2	10267083.0	31.997	853.100710

How to Select Columns by Excluding Certain Data Types in Pandas?

We can also exclude certain data types while selecting columns. The argument we need to use to exclude certain data types is exclude. For example, to exclude columns of float data type,

# exclude a data type
gapminder.select_dtypes(exlude='float')

	country	year	continent
0	Afghanistan	1952	Asia
1	Afghanistan	1957	Asia
2	Afghanistan	1962	Asia

Now you can see that the resulting data frame does not contain any variables with float data type.

Other useful arguments to use with select_dtypes are ‘category’ for selecting or excluding categorical variables. and ‘np.datetime64’, ‘datetime’ or ‘datetime64’ for date times.