• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / How To Code a Character Variable into Integer in Pandas

How To Code a Character Variable into Integer in Pandas

December 14, 2020 by cmdlinetips

Often while working with a Pandas dataframe containing variables of different datatypes, one might want to convert a specific character/string/Categorical variable into a numerical variable. One of the uses of such conversion is that it enables us to quickly perform correlative analysis.

In this post, we will see multiple examples of converting character variable into an integer variable in Pandas. For example, we will convert a character variable with three different values, i.e. Adelie, Gentoo, and Chinstrap, into 0/1/2. Note that this is different from converting integer values stored as character variable, like “1”, “2”, and “3” to integers 1/2/3. For that type of conversion, we can use Pandas’ as_numeric() or astype(int).

How to Code Character Variable as Integers with Pandas?
How to Code Character Variable as Integers with Pandas?

Let us load the packages needed to illustrate this.

import pandas as pd
import seaborn as sns

We will use Palmer Penguins dataset a variable from Seaborn’s inbuilt datasets.

penguins = sns.load_dataset("penguins")
penguins = penguins.dropna()

You can see that the character variables are of data types called object by default in Pandas.

penguins.dtypes

species               object
island                object
bill_length_mm       float64
bill_depth_mm        float64
flipper_length_mm    float64
body_mass_g          float64
sex                   object
dtype: object

1. Coding Character Variable to Integers Using Pandas Series

One of the solutions to convert the character variable into integer values is to work with Series of the variable. We can get the variable of interest as Series with

penguins.species
0      Adelie
1      Adelie
2      Adelie
4      Adelie
5      Adelie
        ...  
338    Gentoo
340    Gentoo
341    Gentoo
342    Gentoo
343    Gentoo
Name: species, Length: 333, dtype: object

And then convert the character variable into a Categorical variable using Pandas astype() function.

penguins.species.astype("category")
0      Adelie
1      Adelie
2      Adelie
4      Adelie
5      Adelie
        ...  
338    Gentoo
340    Gentoo
341    Gentoo
342    Gentoo
343    Gentoo
Name: species, Length: 333, dtype: category
Categories (3, object): ['Adelie', 'Chinstrap', 'Gentoo']

Then get the integers using cat.codes on the categorical variable.

penguins.species.astype("category").cat.codes
0      0
1      0
2      0
4      0
5      0
      ..
338    2
340    2
341    2
342    2
343    2
Length: 333, dtype: int8

In order to save the converted variable as part of the original dataframe, we can re-assign as

penguins.species = penguins.species.astype("category").cat.codes

And now our updated dataframe looks like this

penguins.head()
species	island	bill_length_mm	bill_depth_mm	flipper_length_mm body_mass_g	sex
0	0	Torgersen	39.1	18.7	181.0	3750.0	Male
1	0	Torgersen	39.5	17.4	186.0	3800.0	Female
2	0	Torgersen	40.3	18.0	195.0	3250.0	Female
4	0	Torgersen	36.7	19.3	193.0	3450.0	Female
5	0	Torgersen	39.3	20.6	190.0	3650.0	Male

2. Coding Character Variable to Integers Using Pandas DataFrame

Another way to code a character variable into integer variable is to work with the variable as dataframe object. We can subset a Pandas dataframe as follows

penguins[['species']]


species
0	Adelie
1	Adelie
2	Adelie
4	Adelie
5	Adelie
...	...
338	Gentoo
340	Gentoo
341	Gentoo
342	Gentoo
343	Gentoo
333 rows × 1 columns

And then use apply() function to convert each element as integers as shown below

penguins[['species']].apply(lambda col:pd.Categorical(col).codes)
	species
0	0
1	0
2	0
4	0
5	0
...	...
338	2
340	2
341	2
342	2
343	2
333 rows × 1 columns

To save the converted variable as a variable in the dataframe, we use


penguins[['species']]=penguins[['species']].apply(lambda col:pd.Categorical(col).codes)

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailPandas value_counts: How To Get Counts of Unique Variables in a Dataframe? Default ThumbnailPandas filter(): Select Columns and Rows by Labels in a Dataframe Default ThumbnailHow To Delete Rows in Pandas Dataframe Default Thumbnaildplyr count(): Explore Variables with count in dplyr

Filed Under: Pandas 101, Python Tagged With: Pandas 101, Pandas character to integer, Python

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version