• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Pandas 101 / How to Convert to Best Data Types Automatically in Pandas?

How to Convert to Best Data Types Automatically in Pandas?

April 13, 2020 by cmdlinetips

When you load your data as Pandas dataframe, Pandas automatically assigns a datatype to the variables/columns in the data frame. For example, typically the datatypes would beint, float and object datatypes. With the recent Pandas 1.0.0, we can make Pandas infer the best datatypes for the variables in a dataframe.

We will use Pandas’ convert_dtypes() function and convert the to best data types automatically. Another big advantage of using convert_dtypes() is that it supports Pandas new type for missing values pd.NA.

Let us load Pandas and check its version.

import pandas as pd
pd.__version__
1.0.0

We will use gapminder data set located at cmdlinetips.com’s github page.

data_url = "https://raw.githubusercontent.com/cmdlinetips/data/master/gapminder-FiveYearData.csv"
df = pd.read_csv(data_url)
df.head()

gaopminder dataframe looks like this.

country	year	pop	continent	lifeExp	gdpPercap
0	Afghanistan	1952	8425333.0	Asia	28.801	779.445314
1	Afghanistan	1957	9240934.0	Asia	30.332	820.853030
2	Afghanistan	1962	10267083.0	Asia	31.997	853.100710
3	Afghanistan	1967	11537966.0	Asia	34.020	836.197138
4	Afghanistan	1972	13079460.0	Asia	36.088	739.981106

Let us check the data types of the gapminder dataframe.

df.dtypes

We can see that some are float64, int64 and object. We can also see that string variables are of “object” data type.

country       object
year           int64
pop          float64
continent     object
lifeExp      float64
gdpPercap    float64
dtype: object

Let us use convert_dtypes() function in Pandas starting from version 1.0.0.

By default, convert_dtypes will attempt to convert a Series (or each Series in a DataFrame) to dtypes that support pd.NA. By using the options convert_string, convert_integer, and convert_boolean, it is possible to turn off individual conversions to StringDtype, the integer extension types or BooleanDtype, respectively.

Let us check the results from convert_dtypes() .

df.convert_dtypes().dtypes

We can see that convert_dtypes() function has nicely recognised the variable that are of datatype “object” and converted them to string data type.

country       string
year           Int64
pop          float64
continent     string
lifeExp      float64
gdpPercap    float64
dtype: object

This post is part of the series on Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow To Select Columns by Data Type in Pandas? Default ThumbnailHow to Get Unique Values from a Column in Pandas Data Frame? Default ThumbnailPandas Groupby and Compute Mean Default ThumbnailHow To Get Data Types of Columns in Pandas Dataframe?

Filed Under: Pandas 101

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version