Pandas in Python has numerous functionalities to deal with time series data. One of the simplest tasks in data analysis is to convert date variable that is stored as string type or common object type in in Pandas dataframe to a datetime type variable.
In this post we will see two ways to convert a Pandas column to a datetime type using Pandas.
First, let us load Pandas.
import pandas as pd
Here, we will use sample data and load it directly from cmdlinetips.com‘s github page.
path2data = "https://raw.githubusercontent.com/cmdlinetips/data/master/sample_data_to_convert_column_to_datetime_pandas.csv" #path2data="https://bit.ly/2ZXQ7s9" df = pd.read_csv(path2data)
We can see that the first column looks like date variable with YY-MM-DD format.
df.head() date precipitation temp_max temp_min wind weather 0 2012-01-01 0.0 12.8 5.0 4.7 drizzle 1 2012-01-02 10.9 10.6 2.8 4.5 rain 2 2012-01-03 0.8 11.7 7.2 2.3 rain 3 2012-01-04 20.3 12.2 5.6 4.7 rain 4 2012-01-05 1.3 8.9 2.8 6.1 rain
However, when we check the types of the columns using Pandas’ info() function, we can see that the date column is of type “object”. And we would like to change the column to datetime type.
df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 100 entries, 0 to 99 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 100 non-null object <==== 1 precipitation 100 non-null float64 2 temp_max 100 non-null float64 3 temp_min 100 non-null float64 4 wind 100 non-null float64 5 weather 100 non-null object dtypes: float64(4), object(2) memory usage: 4.8+ KB
Convert a Column to datetime type while loading the file with read_csv()
One of the ways to convert one or more columns in a data frame, is to specify the variable or column name to be loaded as datetime variable while loading the file using Pandas’ read_csv().
Pandas’ read_csv() function has “parse_dates” argument and it takes the list of column names that we want to load as datetime variable. In this example, we have just column of interest and specify its name.
df = pd.read_csv("sample_data_to_convert_column_to_datetime_pandas.csv", parse_dates=['date'])
We can use info() function and see that now our column is of datetime type.
df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 100 entries, 0 to 99 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 100 non-null datetime64[ns] <=== 1 precipitation 100 non-null float64 2 temp_max 100 non-null float64 3 temp_min 100 non-null float64 4 wind 100 non-null float64 5 weather 100 non-null object dtypes: datetime64[ns](1), float64(4), object(1) memory usage: 4.8+ KB
One of the biggest advantages of specifying the column to be datetime variable while loading the file is that we can convert multiple columns if needed.
Convert a Column to datetime with Pandas’ to_datetime()
Another option to convert a column to date type is converting it after loading the data as dataframe. Pandas’ to_datetime() function can take a Series object and convert to datetime format.
In this example, we take the column of interest and give it as input to to_datetime() function and then reassign to the same column variable in the dataframe.
df['date']=pd.to_datetime(df['date'], infer_datetime_format=True) df.info()
We can see that now our column is of datetime type.
<class 'pandas.core.frame.DataFrame'> RangeIndex: 100 entries, 0 to 99 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 100 non-null datetime64[ns]. <==== 1 precipitation 100 non-null float64 2 temp_max 100 non-null float64 3 temp_min 100 non-null float64 4 wind 100 non-null float64 5 weather 100 non-null object dtypes: datetime64[ns](1), float64(4), object(1) memory usage: 4.8+ KB
[…] We can convert the date column into datetime type using Pandas to_datetime() function as shown in the post. […]