In this tutorial, we will learn how to use Pandas map() function to replace multiple column values using a dictionary. Earlier, we saw how to use Pandas replace() function to change the values in multiple columns using dictionary. As we all know, there are multiple solutions to a problem.
Pandas map() function works with Pandas Series not Dataframe directly. As Pandas documentation define Pandas map() function is
Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.
Therefore, here we use Pandas map() with Pandas reshaping functions stack() and unstack() to substitute values from multiple columns with other values using dictionary. In our dictionary, the keys specify column values that we want to replace and values in the dictionary specify what we want in the dataframe.
We will use Pandas’s replace() function to change multiple column’s values at the same time. Let us first load Pandas.
import pandas as pd # import random from random import sample
Let us create some data as before using sample from random module.
# Create two lists in Python name_list = ["name1", "name2","name3","name4"]
Using the name list, let us create three variables using sample() function and create a dataframe with three columns.
cluster1 = sample(name_list,4) cluster2 = sample(name_list,4) cluster3 = sample(name_list,4) df = pd.DataFrame({"cluster1":cluster1, "cluster2":cluster2, "cluster3":cluster3, }) df
Our dataframe looks like this.
cluster1 cluster2 cluster3 0 name1 name1 name4 1 name4 name3 name1 2 name3 name4 name3 3 name2 name2 name2
Let us create a dictionary using zip() function to change values in multiple columns in the dictionary.
symbol_list = ["Symbol1", "Symbol2","Symbol3","Symbol4"] # create a dictionary n2s = dict(zip(name_list,symbol_list)) n2s {'name1': 'Symbol1', 'name2': 'Symbol2', 'name3': 'Symbol3', 'name4': 'Symbol4'}
We will use the common idea of reshaping wide dataset into ling form and then use map() function to substitute the values using a dictionary and then reshape back to our original dataframe’s shape.
Let us first see the result of applying stack() function
df.stack() 0 cluster1 name1 cluster2 name4 cluster3 name2 1 cluster1 name4 cluster2 name1 cluster3 name4 2 cluster1 name2 cluster2 name3 cluster3 name3 3 cluster1 name3 cluster2 name2 cluster3 name1 dtype: object
Now, let us see the result of using map() to replace column values after using stack().
df.stack().map(n2s) 0 cluster1 Symbol1 cluster2 Symbol4 cluster3 Symbol2 1 cluster1 Symbol4 cluster2 Symbol1 cluster3 Symbol4 2 cluster1 Symbol2 cluster2 Symbol3 cluster3 Symbol3 3 cluster1 Symbol3 cluster2 Symbol2 cluster3 Symbol1 dtype: object
Now that we have substituted the values of columns, now we can apply unstack() to reshape back to wide form from tidy form. And we get a dataframe with replaced values.
df.stack().map(n2s).unstack() cluster1 cluster2 cluster3 0 Symbol1 Symbol4 Symbol2 1 Symbol4 Symbol1 Symbol4 2 Symbol2 Symbol3 Symbol3 3 Symbol3 Symbol2 Symbol1
Want to get better at using Pandas for data science-ing? Check out Byte Sized Pandas 101 tutorials.