Selecting a column or multiple columns from a Pandas dataframe is a common task in exploratory data analysis in doing data science/munging/wrangling.
In this post, we will see examples of
- How to select one column from Pandas dataframe?
- How to select multiple columns from Pandas dataframe?
Let us first load Pandas library
import pandas as pd
Let us use gapminder dataset from Carpentries website to select columns.
data_url = 'http://bit.ly/2cLzoxH' gapminder = pd.read_csv(data_url) gapminder.head(n=3)
We can see that gapminder data frame has six columns or variables.
country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
How to Select One Column from Dataframe in Pandas?
The easiest way to select a column from a dataframe in Pandas is to use name of the column of interest. For example, to select column with the name “continent” as argument []
gapminder['continent'] 0 Asia 1 Asia 2 Asia 3 Asia 4 Asia
Directly specifying the column name to [] like above returns a Pandas Series object. We can see that using type function on the returned object.
>type(gapminder['continent']) pandas.core.series.Series
If we want to select a single column and want a DataFrame containing just the single column, we need to use [[]], double square bracket with a single column name inside it. For example, to select the continent column and get a Pandas data frame with single column as output
>gapminder[['continent']]) continent 0 Asia 1 Asia 2 Asia 3 Asia 4 Asia
Note that now the result has column name “continent” hinting that we now have a dataframe. We can check that using type function as before.
>type(gapminder[['continent']]) pandas.core.frame.DataFrame
How to Select Multiple Columns from a Data Frame in Pandas?
We can use double square brackets [[]] to select multiple columns from a data frame in Pandas. In the above example, we used a list containing just a single variable/column name to select the column. If we want to select multiple columns, we specify the list of column names in the order we like.
For example, to select two columns “country” and “year”, we use the [[]] with two column names inside.
# select multiple columns using column names as list gapminder[['country','year']].head() country year 0 Afghanistan 1952 1 Afghanistan 1957 2 Afghanistan 1962 3 Afghanistan 1967 4 Afghanistan 1972
Selecting Multiple Columns in Pandas Using loc
We can also use “loc” function to select multiple columns. For example, to select the two columns [‘country’,’year’], we can use
# select multiple columns using loc gapminder.loc[,: ['country','year']].head() country year 0 Afghanistan 1952 1 Afghanistan 1957 2 Afghanistan 1962 3 Afghanistan 1967 4 Afghanistan 1972
How to Select Multiple Columns Using Column Index in Pandas?
Sometimes, it is easier to select columns by their location instead of the column names.
We can get the columns of a data frame using columns function
# get column names of Pandas dataframe >gapminder.columns Index(['country', 'year', 'pop', 'continent', 'lifeExp', 'gdpPercap'], dtype='object')
Selecting first N columns in Pandas
To select the first two or N columns we can use the column index slice “gapminder.columns[0:2]” and get the first two columns of Pandas dataframe.
# select first two columns gapminder[gapminder.columns[0:2]].head() country year 0 Afghanistan 1952 1 Afghanistan 1957 2 Afghanistan 1962 3 Afghanistan 1967 4 Afghanistan 1972
Selecting last N columns in Pandas
One of the advantages of using column index slice to select columns from Pandas dataframe is that we can get part of the data frame. For example, to select the last two (or N) columns, we can use column index of last two columns
“gapminder.columns[-2:gapminder.columns.size]” and select them as before.
# gapminder.columns.size gets the number of columns # gapminder.columns[-2:gapminder.columns.size] gets the last two columns gapminder[gapminder.columns[-2:gapminder.columns.size]]
lifeExp gdpPercap 0 28.801 779.445314 1 30.332 820.853030 2 31.997 853.100710 3 34.020 836.197138 4 36.088 739.981106