How To Loop Through Pandas Rows? or How To Iterate Over Pandas Rows?

Sometimes you may want to loop/iterate over Pandas data frame and do some operation on each rows. Pandas has at least two options to iterate over rows of a dataframe.

Let us see examples of how to loop through Pandas data frame. First we will use Pandas iterrows function to iterate over rows of a Pandas dataframe. In addition to iterrows, Pandas also has an useful function itertuples(). We will also see examples of using itertuples() to iterate over rows of Pandas dataframe. There are subtle differences in using each of them, and we will also see them.

Let us use an interesting dataset available in vega_datasets in Python.

 
# import vega_dataets
from vega_datasets import data
#import pandas
import pandas as pd

Let us see the available datasets in vega_datasets and use flights_2k data set.

 
# check to see the list of data sets
data.list_datasets()
flights=data.flights_2k()

It contains flight departure, arrival, and distance information for 2000 flights.

 
flights.head()
	date	delay	destination	distance	origin
0	2001-01-14 21:55:00	0	SMF	480	SAN
1	2001-03-26 20:15:00	-11	SLC	507	PHX
2	2001-03-05 14:55:00	-3	LAX	714	ELP

How to Iterate Through Rows with Pandas iterrows()

Pandas has iterrows() function that will help you loop through each row of a dataframe. Pandas’ iterrows() returns an iterator containing index of each row and the data in each row as a Series.

Since iterrows() returns iterator, we can use next function to see the content of the iterator. We can see that it iterrows returns a tuple with row index and row data as a Series object.

 
>next(flights.iterrows())
(0, date           2001-01-14 21:55:00
 delay                            0
 destination                    SMF
 distance                       480
 origin                         SAN
 Name: 0, dtype: object)

We can get the content row by taking the second element of the tuple.

 
row = next(flights.iterrows())[1]
row
date           2001-01-14 21:55:00
delay                            0
destination                    SMF
distance                       480
origin                         SAN
Name: 0, dtype: object

We can loop through Pandas dataframe and access the index of each row and the content of each row easily. Here we print the iterator from iterrows() and see that we get an index and Series for each row.

 
for index, row in flights.head(n=2).iterrows():
     print(index, row)

0 date           2001-01-14 21:55:00
delay                            0
destination                    SMF
distance                       480
origin                         SAN
Name: 0, dtype: object
1 date           2001-03-26 20:15:00
delay                          -11
destination                    SLC
distance                       507
origin                         PHX
Name: 1, dtype: object

Since the row data is returned as a Series, we can use the column names to access each column’s value in the row. Here we loop through each row and we assign the row index and row data to variables named index and row. Then we access row data using the column names of the dataframe.

 
# iterate over rows with iterrows()
for index, row in flights.head().iterrows():
     # access data using column names
     print(index, row['delay'], row['distance'], row['origin'])

 
0 0 480 SAN
1 -11 507 PHX
2 -3 714 ELP
3 12 342 SJC
4 2 373 SMF

Because iterrows() returns a Series for each row, it does not preserve data types across the rows. However, data types are preserved across columns for DataFrames. Let us see a simple example illustrating this

Let us create a simple data frame with one row with two columns, where one column is an int and the other is a float.

 
>df = pd.DataFrame([[3, 5.5]], columns=['int_column', 'float_column'])
>print(df)
   int_column  float_column
0           3           5.5

Let us use iterrows() to get the content of row and print the data type of int_column. In the original dataframe int_column is an integer. However, when see the data type through iterrows(), the int_column is a float object

 
>row = next(df.iterrows())[1]
>print(row['int_column'].dtype)
float64

How to Iterate Over Rows of Pandas Dataframe with itertuples()

A better way to iterate/loop through rows of a Pandas dataframe is to use itertuples() function available in Pandas. As the name itertuples() suggest, itertuples loops through rows of a dataframe and return a named tuple.
The first element of the tuple is row’s index and the remaining values of the tuples are the data in the row. Unlike iterrows, the row data is not stored in a Series.

Let us loop through content of dataframe and print each row with itertuples.

 
for row in flights.head().itertuples():
    print(row)
Pandas(Index=0, date=Timestamp('2001-01-14 21:55:00'), delay=0, destination='SMF', distance=480, origin='SAN')
Pandas(Index=1, date=Timestamp('2001-03-26 20:15:00'), delay=-11, destination='SLC', distance=507, origin='PHX')
Pandas(Index=2, date=Timestamp('2001-03-05 14:55:00'), delay=-3, destination='LAX', distance=714, origin='ELP')
Pandas(Index=3, date=Timestamp('2001-01-07 12:30:00'), delay=12, destination='SNA', distance=342, origin='SJC')
Pandas(Index=4, date=Timestamp('2001-01-18 12:00:00'), delay=2, destination='LAX', distance=373, origin='SMF')

We can see that itertuples simply returns the content of row as named tuple with associated column names. Therefore we can simply access the data with column names and Index, like

 
for row in flights.head().itertuples():
    print(row.Index, row.date, row.delay)

We will get each row as

 
0 2001-01-14 21:55:00 0
1 2001-03-26 20:15:00 -11
2 2001-03-05 14:55:00 -3
3 2001-01-07 12:30:00 12
4 2001-01-18 12:00:00 2

Another benefit of itertuples is that it is generally faster than iterrows().

How to Iterate Through Rows with Pandas iterrows()

How to Iterate Over Rows of Pandas Dataframe with itertuples()

Share this:

Related posts: