In this tutorial, we will learn how to convert two columns from dataframe into a dictionary. As the picture below shows the goal is to use one of the column values as keys to dictionary and the second column values as values. Note that this is different from creating a dictionary with column name as key.
We will see two ways to use to_dict() functions to convert two columns into a dictionary.
First we will learn how to use Python’s zip() function to create a dictionary from two columns and then we will learn how to use Pandas’ to_dict() function to create a dictionary in two different ways.
Let us first load Pandas.
import pandas as pd
We will use the US states data set containing two letter codes and state names. The data is available at cmdlinetips.com‘s github page.
states_df = pd.read_csv("https://raw.githubusercontent.com/cmdlinetips/data/master/us_states.tsv", sep="\t")
For our examples, let us subset the data and our data looks like this.
df =states_df.head() df state latitude longitude name 0 AK 63.588753 -154.493062 Alaska 1 AL 32.318231 -86.902298 Alabama 2 AR 35.201050 -91.831833 Arkansas 3 AZ 34.048928 -111.093731 Arizona 4 CA 36.778261 -119.417932 California
Pandas Columns to Dictionary with zip
Our goal is to create a dictionary with state code as keys and state names as values. I have been using zip() function in Python to create list of tuples and then use dict() function to conver the list of tuples into a dictionary.
In Python 3+, zip() function takes iterables as its argument and returns iterator.
zip(df.state, df.name) <zip at 0x7fb78d7bd4b0>
zip() function’s output is of zip type.
type(zip(df.state, df.name))
We can use list() function on the results from zip() function to see the list of tuples.
list(zip(df.state, df.name)) [('AK', 'Alaska'), ('AL', 'Alabama'), ('AR', 'Arkansas'), ('AZ', 'Arizona'), ('CA', 'California')]
Applying dict() function on the zip object with two iterables gives us the dictionary we need.
dict(zip(df.state, df.name)) {'AK': 'Alaska', 'AL': 'Alabama', 'AR': 'Arkansas', 'AZ': 'Arizona', 'CA': 'California'}
Pandas Columns to Dictionary with Pandas’ to_dict() function
Recently came across Pandas’ to_dict() function. It is a versatile function to convert a Pandas dataframe or Series into a dictionary. In most use cases, Pandas’ to_dict() function creates dictionary of dictionaries. It uses column names as keys and the column values as values. It creates a dictionary for column values using the index as keys.
However, our purpose is slightly different, with one of the columns being keys for dictionary and the other column being values. To create a dictionary from two column values, we first create a Pandas series with the column for keys as index and the other column as values. And then we can apply Pandas’ to_dict() function to get dictionary.
pd.Series(df.name.values,index=df.state).to_dict()
{'AK': 'Alaska', 'AL': 'Alabama', 'AR': 'Arkansas', 'AZ': 'Arizona', 'CA': 'California'}
Another approach to convert two column values into a dictionary is to first set the column values we need as keys to be index for the dataframe and then use Pandas’ to_dict() function to convert it a dictionary. This creates a dictionary for all columns in the dataframe. Therefore, we select the column we need from the “big” dictionary.
df.set_index('state').to_dict()['name']
{'AK': 'Alaska', 'AL': 'Alabama', 'AR': 'Arkansas', 'AZ': 'Arizona', 'CA': 'California'}
Want to get better at using Pandas for data science-ing? Check out Byte Sized Pandas 101 tutorials.