How To Get The Memory Usage of Pandas Dataframe?

Getting to know how much memory used by a Pandas dataframe can be extremely useful when working with bigger dataframe. In this post we will see two examples of estimating memory usage of a Pandas dataframe using Pandas functionalities. We will first see how to find the total memory usage of Pandas dataframe using Pandas info() function and then we will see an example of finding memory usage of all the variables in the dataframe using Pandas memory_usage() function.

Let us load Pandas first and check its version.

import pandas as pd
pd.__version__
1.0.0

We will use a dataset from TidyTuesday project and this data set is on college tuition cost across USA.

data_url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/tuition_cost.csv"
df = pd.read_csv(data_url)
df.iloc[0:5,0:5]

We can see that our dataframe contain different datatypes.


        name	state	state_code	type	degree_length
0	Aaniiih Nakoda College	Montana	MT	Public	2 Year
1	Abilene Christian University	Texas	TX	Private	4 Year
2	Abraham Baldwin Agricultural College	Georgia	GA	Public	2 Year
3	Academy College	Minnesota	MN	For Profit	2 Year
4	Academy of Art University	California	CA	For Profit	4 Year

Total Memory Usage of Pandas Dataframe with info()

We can use Pandas info() function to find the total memory usage of a dataframe. Pandas info() function is mainly used for information about each of the columns, their data types, and how many values are not null for each variable. Pandas info() fnction also gives us the memory usage at the end of its report.

To get the full memory usage, we provide memory_usage=”deep” argument to info().

df.info(memory_usage="deep")

We get all basic information about the dataframe and towards the end we also get the “memory usage: 1.1 MB” for the data frame.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2973 entries, 0 to 2972
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   name                  2973 non-null   object 
 1   state                 2921 non-null   object 
 2   state_code            2973 non-null   object 
 3   type                  2973 non-null   object 
 4   degree_length         2973 non-null   object 
 5   room_and_board        1879 non-null   float64
 6   in_state_tuition      2973 non-null   int64  
 7   in_state_total        2973 non-null   int64  
 8   out_of_state_tuition  2973 non-null   int64  
 9   out_of_state_total    2973 non-null   int64  
dtypes: float64(1), int64(4), object(5)
memory usage: 1.1 MB

Memory Usage of Each Column in Pandas Dataframe with memory_usage()

Pandas info() function gave the total memory used by a dataframe. However, sometimes you may want memory used by each column in a Pandas dataframe.

We can get each column/variable level memory usage using Pandas memory_usage() function.

df.memory_usage()

We get memory used by each column/variable in bytes. By default, memory_usage() ignores the memory footprint of variables with data type object.

Index                     128
name                    23784
state                   23784
state_code              23784
type                    23784
degree_length           23784
room_and_board          23784
in_state_tuition        23784
in_state_total          23784
out_of_state_tuition    23784
out_of_state_total      23784
dtype: int64

We can get memory usage iuncluding object datatype using the argument deep=True to memory_usage() function.

df.memory_usage(deep=True)

We get bytes used by each variable, but this time it gives the memory use of object data types.

Index                      128
name                    248346
state                   193391
state_code              175407
type                    189007
degree_length           187298
room_and_board           23784
in_state_tuition         23784
in_state_total           23784
out_of_state_tuition     23784
out_of_state_total       23784
dtype: int64

Since memory_usage() function returns a dataframe of memory usage, we can sum it to get the total memory used.

df.memory_usage(deep=True).sum()
1112497

We can see that memory usage estimated by Pandas info() and memory_usage() with deep=True option matches. Typically, object variables can have large memory footprint. By converting object variable of type string to categorical, one can reduce memory footprint.

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.