How To Select One or More Columns in Pandas?

How To Select Columns in Python Pandas?
How To Select Columns in Python Pandas?
How To Select Columns in Python Pandas?

Selecting a column or multiple columns from a Pandas dataframe is a common task in exploratory data analysis in doing data science/munging/wrangling.

In this post, we will see examples of

  • How to select one column from Pandas dataframe?
  • How to select multiple columns from Pandas dataframe?

Let us first load Pandas library

import pandas as pd

Let us use gapminder dataset from Carpentries website to select columns.

data_url = 'http://bit.ly/2cLzoxH'
gapminder = pd.read_csv(data_url)
gapminder.head(n=3)

We can see that gapminder data frame has six columns or variables.

	country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710

How to Select One Column from Dataframe in Pandas?

The easiest way to select a column from a dataframe in Pandas is to use name of the column of interest. For example, to select column with the name “continent” as argument []

gapminder['continent']

0    Asia
1    Asia
2    Asia
3    Asia
4    Asia

Directly specifying the column name to [] like above returns a Pandas Series object. We can see that using type function on the returned object.

>type(gapminder['continent'])
pandas.core.series.Series

If we want to select a single column and want a DataFrame containing just the single column, we need to use [[]], double square bracket with a single column name inside it. For example, to select the continent column and get a Pandas data frame with single column as output

>gapminder[['continent']])

  continent
0	Asia
1	Asia
2	Asia
3	Asia
4	Asia

Note that now the result has column name “continent” hinting that we now have a dataframe. We can check that using type function as before.

>type(gapminder[['continent']])
pandas.core.frame.DataFrame

How to Select Multiple Columns from a Data Frame in Pandas?

We can use double square brackets [[]] to select multiple columns from a data frame in Pandas. In the above example, we used a list containing just a single variable/column name to select the column. If we want to select multiple columns, we specify the list of column names in the order we like.

For example, to select two columns “country” and “year”, we use the [[]] with two column names inside.

# select multiple columns using column names as list
gapminder[['country','year']].head()

country	year
0	Afghanistan	1952
1	Afghanistan	1957
2	Afghanistan	1962
3	Afghanistan	1967
4	Afghanistan	1972

Selecting Multiple Columns in Pandas Using loc

We can also use “loc” function to select multiple columns. For example, to select the two columns [‘country’,’year’], we can use

# select multiple columns using loc
gapminder.loc[,: ['country','year']].head()

country	year
0	Afghanistan	1952
1	Afghanistan	1957
2	Afghanistan	1962
3	Afghanistan	1967
4	Afghanistan	1972

How to Select Multiple Columns Using Column Index in Pandas?

Sometimes, it is easier to select columns by their location instead of the column names.

We can get the columns of a data frame using columns function

# get column names of Pandas dataframe
>gapminder.columns
Index(['country', 'year', 'pop', 'continent', 'lifeExp', 'gdpPercap'], dtype='object')

Selecting first N columns in Pandas

To select the first two or N columns we can use the column index slice “gapminder.columns[0:2]” and get the first two columns of Pandas dataframe.

# select first two columns
gapminder[gapminder.columns[0:2]].head()

country	year
0	Afghanistan	1952
1	Afghanistan	1957
2	Afghanistan	1962
3	Afghanistan	1967
4	Afghanistan	1972

Selecting last N columns in Pandas

One of the advantages of using column index slice to select columns from Pandas dataframe is that we can get part of the data frame. For example, to select the last two (or N) columns, we can use column index of last two columns
“gapminder.columns[-2:gapminder.columns.size]” and select them as before.

# gapminder.columns.size gets the number of columns
# gapminder.columns[-2:gapminder.columns.size] gets the last two columns
gapminder[gapminder.columns[-2:gapminder.columns.size]]
	lifeExp	gdpPercap
0	28.801	779.445314
1	30.332	820.853030
2	31.997	853.100710
3	34.020	836.197138
4	36.088	739.981106