• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / How To Compare Two Dataframes with Pandas compare?

How To Compare Two Dataframes with Pandas compare?

August 2, 2020 by cmdlinetips

In this post, we will learn how to compare two Pandas dataframes and summarize their differences using Pandas compare() function.

Sometimes you may have two similar dataframes and would like to know exactly what those differences are between the two data frames. Starting from Pandas 1.1.0 version, Pandas has a new function compare() that lets you compare two data frames or Series and identify the differences between them and nicely tabulate them.

Let us load Pandas and Numpy.

import pandas as pd
import numpy as np

Let us check Pandas version and make sure we have Pandas version 1.1.0 and above. Otherwise make sure to install the latest version of Pandas using conda/pip install.

pd.__version__
'1.1.0'

We will first create a toy dataframe with three columns and four rows.

# create dataframe
df1 = pd.DataFrame(
    {
       "col1": ["a", "v", "x", "y"],
       "col2": [1.0, 2.0, 3.0, np.nan],
       "col3": [7.0, 8.0, 9.0, 3.0]
     },
    columns=["col1", "col2", "col3"],
)

And this is how our data looks like.

df1

      col1	col2	col3
0	a	1.0	7.0
1	v	2.0	8.0
2	x	3.0	9.0
3	y	NaN	3.0

Let us create second dataframe by copying the first dataframe and making two changes. In this example, we change the values of first row in first column and second row in second column.

# create a copy of dataframe
df2 = df1.copy()
# change the values of few elements
df2.loc[0, 'col1'] = 'zz'
df2.loc[1, 'col2'] = '10'

If we compare it to the first data frame, it differs in just two places in the first two rows.

df2

     col1	col2	col3
0	zz	1	7.0
1	v	10	8.0
2	x	3	9.0
3	y	NaN	3.0

Let us use Pandas compare() function to summarize the differences between the two dataframes. Pandas compare() function outputs the columns and elements that is different between the dataframes. “self” label in the result correspond to the dataframe that we compare and “other” label corresponds to the dataframe that we compare to.

By default, if two corresponding values are equal, they will be shown as NaN. Furthermore, if all values in an entire row / column, the row / column will be omitted from the result. The remaining differences will be aligned on columns.

df1.compare(df2)

            col1	col2
       self	other	self	other
0	a	zz	NaN	NaN
1	NaN	NaN	2.0	10

We can also change how Pandas compare display result. Here with align_axis=”rows” or align_axis=0, Pandas compare() function displays self and other in separate rows.

df1.compare(df2, align_axis='rows')

        col1	 col2
0	self	a	NaN
      other	zz	NaN
1	self	NaN	2
      other	NaN	10

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow To Concatenate Two or More Pandas DataFrames? Default ThumbnailHow To Add Identifier Column When Concatenating Pandas data frames? Default ThumbnailHow To Merge/Join DataFrames with Pandas in Python? Pandas Compute Percent ChangePandas pct_change() to compute percent change across columns/rows

Filed Under: Pandas 101, Pandas compare, Python Tips Tagged With: compare two dataframes pandas, pandas compare() function

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version