In this post, we will learn how to reset index in Pandas dataframe starting from zero. We will use pandas reset_index() function to reset index of a dataframe.
Often you start with a big dataframe in Pandas and after manipulating and filtering the data frame you will end up with much smaller data frame.
When you look at the smaller dataframe, it might still carry the row index of the original dataframe. If the original row index are numbers, now you will have indexes that are not continuous starting from 0 to one less than number of rows. You might want to reset the dataframe’s index to zero to the small dataframe. And pandas reset_index is here to help us.
Let us load Pandas.
import pandas as pd
Let us use the gapminder data from Software Carpentry website and load it as Pandas dataframe. The gapminder data frame has over 1700 rows corresponding countries around the world and 6 columns.
gapminder_url='https://bit.ly/2cLzoxH' gapminder = pd.read_csv(gapminder_url) gapminder.head()
Let us do some dataframe manipulation to get a smaller dataframe. Let us first drop a few columns just for ease of visualizing the output dataframe.
>gapminder = gapminder.drop(['pop','gdpPercap'],axis=1) >print(gapminder.shape) (1704, 4)
Now our dataframe will have just 4 columns and all the rows. Let us do some filtering and select rows containing countries from Oceania continent and for the years greater than 2000.
gapminder_ocean = gapminder[(gapminder.year > 2000) & (gapminder.continent == 'Oceania')] gapminder_ocean.shape (4, 4)
After filtering we have a dataframe with just 4 rows corresponding to two countries in Oceania continent. Also note that the row index of the dataframe is 70,71, 1102, and 1103. These were original row index of these rows.
print(gapminder_ocean) country year continent lifeExp 70 Australia 2002 Oceania 80.370 71 Australia 2007 Oceania 81.235 1102 New Zealand 2002 Oceania 79.110 1103 New Zealand 2007 Oceania 80.204
pandas reset_index() to reset row index to zero
We can reset the row index in pandas with reset_index() to make the index start from 0. We can call reset_index() on the dataframe and get
gapminder_ocean.reset_index() index country year continent lifeExp 0 70 Australia 2002 Oceania 80.370 1 71 Australia 2007 Oceania 81.235 2 1102 New Zealand 2002 Oceania 79.110 3 1103 New Zealand 2007 Oceania 80.204
Now the row index starts from 0 and also note that pandas reset_index() keeps the original row index as a new column with the name index.
Often you don’t need the extra column with original row index. We can specify pandas to not to keep the original index with the argument drop=True.
gapminder_ocean.reset_index(drop=True) country year continent lifeExp 0 Australia 2002 Oceania 80.370 1 Australia 2007 Oceania 81.235 2 New Zealand 2002 Oceania 79.110 3 New Zealand 2007 Oceania 80.204
reset_index() to reset pandas index to zero in-place
If you want to reset index to zero in place, we cal also add the inplace=True argument.
gapminder_ocean.reset_index(drop=True, inplace=True) gapminder_ocean country year continent lifeExp 0 Australia 2002 Oceania 80.370 1 Australia 2007 Oceania 81.235 2 New Zealand 2002 Oceania 79.110 3 New Zealand 2007 Oceania 80.204