Often when you are working with bigger dataframe and doing some data cleaning or exploratory data analysis, you might want to select columns of Pandas dataframe by their data types.
For example, you might want to quickly select columns that are numerical in type and visualize their summary data. Or you might want to select columns that are categorical type and check their levels.
Let us see examples of selecting columns based on their data type. Pandas has a bit obscure, but very useful function called select_dtypes to help us select columns by their data types.
Let us load Pandas .
# import pandas import pandas as pd
Let us load gapminder data as data frame
# gapminder data data_url = 'http://bit.ly/2cLzoxH' gapminder = pd.read_csv(data_url) gapminder.head(n=3)
country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
Let us check the data types of gapminder dataframe using the Pandas method dtypes
# get datatypes of columns in the dataframe >gapminder.dtypes country object year int64 pop float64 continent object lifeExp float64 gdpPercap float64 dtype: object
How To Select Columns with NUmerical Data Types
Pandas select_dtypes function allows us to specify a data type and select columns matching the data type.
For example, to select columns with numerical data type, we can use select_dtypes with argument number. Now we get a new data frame with only numerical datatypes.
gapminder.select_dtypes(np.number).head() year pop lifeExp gdpPercap 0 1952 8425333.0 28.801 779.445314 1 1957 9240934.0 30.332 820.853030 2 1962 10267083.0 31.997 853.100710
We can also be more specify and select data types matching “float” or “integer”. If we want to select columns with float datatype, we use
gapminder.select_dtypes('float') pop lifeExp gdpPercap 0 8425333.0 28.801 779.445314 1 9240934.0 30.332 820.853030 2 10267083.0 31.997 853.100710
How to Select Columns by Excluding Certain Data Types in Pandas?
We can also exclude certain data types while selecting columns. The argument we need to use to exclude certain data types is exclude. For example, to exclude columns of float data type,
# exclude a data type gapminder.select_dtypes(exlude='float') country year continent 0 Afghanistan 1952 Asia 1 Afghanistan 1957 Asia 2 Afghanistan 1962 Asia
Now you can see that the resulting data frame does not contain any variables with float data type.
Other useful arguments to use with select_dtypes are ‘category’ for selecting or excluding categorical variables. and ‘np.datetime64’, ‘datetime’ or ‘datetime64’ for date times.