3 Ways to Select One or More Columns with Pandas

Subsetting a data frame by selecting one or more columns from a Pandas dataframe is one of the most common tasks in doing data analysis. With Pandas, we can use multiple ways to select or subset one or more columns from a dataframe.

In this post, we will see 3 ways to select one or more columns with Pandas. Let us first load the needed packages including Pandas and NumPy.

# load pandas
import pandas as pd
# load numpy
import numpy as np
# check Pandas' version
pd.__version__
'1.0.0'

We will use NumPy’s random module to create data and store them as Pandas dataframe.

df =pd.DataFrame(np.random.randint(20, size=(8,3)),
                  index=list('ijklmnop'),
                   columns=list('ABC'))

This is a toy data frame with just 3 columns and 8 rows. In this example, all the columns are of same type. In real life the data frame can contain multiple different data types and the ways to select columns will work fine.

df.head()

	A	B	C
i	6	19	14
j	10	7	6
k	18	10	10
l	3	7	2
m	1	11	5

1. How To Select a Single Column with Indexing Operator [] ?

One way to select a column from Pandas data frame is to the square bracket. Square brackets in Pandas is indexing operator that lets us select columns.

One of the things to note is that we need to provide the column name that we want to select as a list to the indexing operator [].

Therefore, to select a single column with name “A” in our dataframe, we need to use indexing operator and the column name as a list, like df[[‘A’]]. And Yes this would be double square bracket and the result will be another subsetted data frame.


df[['A']]

        A
i	9
j	11
k	11
l	2
m	8
n	11
o	17
p	17

In Pandas, we can select a single column with just using the index operator [], but without list as argument. However, the resulting object is a Pandas series instead of Pandas Dataframe. For example, if we use df[‘A’], we would have selected the single column as Pandas Series object.

df['A']

i    18
j     2
k     6
l    17
m    17
n    19
o    11
p     2
Name: A, dtype: int64

Note that the Series does not have column name attached to it. However, the subsetted dataframe has the column name that wee selected.

How To Selecting Multiple Columns with Indexing Operator [] in Pandas?

We can use the indexing operator with list as its argument and select more than one columns. For example, to select two columns, we specify the name of the columns that we want to select as a list and give that as argument to the indexing operator.

df[['A','B']]

	A	B
i	6	15
j	3	3
k	15	5
l	11	4
m	17	19
n	7	4
o	17	8
p	17	6

2. How To Select Multiple Columns with .loc accessor in Pandas?

The second way to select one or more columns of a Pandas dataframe is to use .loc accessor in Pandas. PanAdas .loc[] operator can be used to select rows and columns. In this example, we will use .loc[] to select one or more columns from a data frame.

To select all rows and a select columns we use .loc accessor with square bracket. We first specify, we want all rows with “:” colon symbol and then provide list of column names that we want to select after comma, as shown here. In this example below, we use .loc[] with multiple columns and it would give us a dataframe with select columns.

df.loc[:,['A','B']]
       A	B
i	6	15
j	3	3
k	15	5
l	11	4
m	17	19
n	7	4
o	17	8
p	17	6

We can also use .loc[] to select a single column by providing the column name as a list to .loc[].

3. How To Select Multiple Columns with .iloc accessor in Pandas?

The third was to select columns of a dataframe in Pandas is to use iloc[] function. In the above two methods of selecting one or more columns of a dataframe, we used the column names to subset the dataframe. With iloc[] we can not use the names of the columns, but we need to specify the index of the columns.

To selecting multiple columns using iloc[], we first specify we want to keep all rows with “:” colon symbol and then provide the list of indices for the columns we want to select.

In our example below, we are selecting 2nd and 3rd columns in the dataframe using the list [1,2].

df.iloc[:,[1,2]]

The result from iloc[] is another dataframe with fewer column.

	B	C
i	19	14
j	7	6
k	10	10
l	7	2
m	11	5
n	0	11
o	16	9
p	14	14

One can immediately see that the use iloc[] with indices is more cumbersome for selecting columns. The first two methods for selecting column using their names are better options to select columns in Pandas’ dataframe.

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.