• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / How To Get The Memory Usage of Pandas Dataframe?

How To Get The Memory Usage of Pandas Dataframe?

March 31, 2020 by cmdlinetips

Getting to know how much memory used by a Pandas dataframe can be extremely useful when working with bigger dataframe. In this post we will see two examples of estimating memory usage of a Pandas dataframe using Pandas functionalities. We will first see how to find the total memory usage of Pandas dataframe using Pandas info() function and then we will see an example of finding memory usage of all the variables in the dataframe using Pandas memory_usage() function.

Let us load Pandas first and check its version.

import pandas as pd
pd.__version__
1.0.0

We will use a dataset from TidyTuesday project and this data set is on college tuition cost across USA.

data_url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-10/tuition_cost.csv"
df = pd.read_csv(data_url)
df.iloc[0:5,0:5]

We can see that our dataframe contain different datatypes.


        name	state	state_code	type	degree_length
0	Aaniiih Nakoda College	Montana	MT	Public	2 Year
1	Abilene Christian University	Texas	TX	Private	4 Year
2	Abraham Baldwin Agricultural College	Georgia	GA	Public	2 Year
3	Academy College	Minnesota	MN	For Profit	2 Year
4	Academy of Art University	California	CA	For Profit	4 Year

Total Memory Usage of Pandas Dataframe with info()

We can use Pandas info() function to find the total memory usage of a dataframe. Pandas info() function is mainly used for information about each of the columns, their data types, and how many values are not null for each variable. Pandas info() fnction also gives us the memory usage at the end of its report.

To get the full memory usage, we provide memory_usage=”deep” argument to info().

df.info(memory_usage="deep")

We get all basic information about the dataframe and towards the end we also get the “memory usage: 1.1 MB” for the data frame.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2973 entries, 0 to 2972
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   name                  2973 non-null   object 
 1   state                 2921 non-null   object 
 2   state_code            2973 non-null   object 
 3   type                  2973 non-null   object 
 4   degree_length         2973 non-null   object 
 5   room_and_board        1879 non-null   float64
 6   in_state_tuition      2973 non-null   int64  
 7   in_state_total        2973 non-null   int64  
 8   out_of_state_tuition  2973 non-null   int64  
 9   out_of_state_total    2973 non-null   int64  
dtypes: float64(1), int64(4), object(5)
memory usage: 1.1 MB

Memory Usage of Each Column in Pandas Dataframe with memory_usage()

Pandas info() function gave the total memory used by a dataframe. However, sometimes you may want memory used by each column in a Pandas dataframe.

We can get each column/variable level memory usage using Pandas memory_usage() function.

df.memory_usage()

We get memory used by each column/variable in bytes. By default, memory_usage() ignores the memory footprint of variables with data type object.

Index                     128
name                    23784
state                   23784
state_code              23784
type                    23784
degree_length           23784
room_and_board          23784
in_state_tuition        23784
in_state_total          23784
out_of_state_tuition    23784
out_of_state_total      23784
dtype: int64

We can get memory usage iuncluding object datatype using the argument deep=True to memory_usage() function.

df.memory_usage(deep=True)

We get bytes used by each variable, but this time it gives the memory use of object data types.

Index                      128
name                    248346
state                   193391
state_code              175407
type                    189007
degree_length           187298
room_and_board           23784
in_state_tuition         23784
in_state_total           23784
out_of_state_tuition     23784
out_of_state_total       23784
dtype: int64

Since memory_usage() function returns a dataframe of memory usage, we can sum it to get the total memory used.

df.memory_usage(deep=True).sum()
1112497

We can see that memory usage estimated by Pandas info() and memory_usage() with deep=True option matches. Typically, object variables can have large memory footprint. By converting object variable of type string to categorical, one can reduce memory footprint.

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Get Unique Values from a Column in Pandas Data Frame? Default ThumbnailHow to Change Type for One or More Columns in Pandas Dataframe? Default ThumbnailPandas 1.0.0 is Here: Top New Features of Pandas You Should Know Default ThumbnailHow To Insert a Column at Specific Location in Pandas DataFrame?

Filed Under: Pandas 101

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version