How to Combine Year, Month, and Day Columns to single date in Pandas

In this post, we will see how to combine columns containing year, month, and day into a single column of datetime type. We can combine multiple columns into a single date column in multiple ways. First, we will see how can we combine year, month and day column into a column of type datetime, while reading the data using Pandas read_csv() function. Next we will combine year, month and day columns using Pandas’ apply() function.

First, let us load Pandas.

import pandas as pd

We will use sample data containing just three columns, year, month, and day. We will load the data directly from github page.

path2data = "https://raw.githubusercontent.com/cmdlinetips/data/master/combine_year_month_day_into_date_pandas.csv"
df = pd.read_csv(path2data)

Our data looks like this.

df.head()

year	month	day
0	2012	1	1
1	2012	1	2
2	2012	1	3
3	2012	1	4
4	2012	1	5

Combining Year, Month, and Day Columns into Datetime column while reading the file

One of the ways to combine 3 columns corresponding to Year, Month, and Day in a dataframe is to parse them as date variable while loading the file as Pandas dataframe.

While loading the file as Pandas’ data frame using read_csv() function we can specify the column names to be combined into datetime column. We will use “parse_dates” argument to read_csv() function and provide the year,month,and day columns as values for dictionary with new date variable as key.

df = pd.read_csv(path2data,
                 parse_dates= {"date" : ["year","month","day"]})

Now Pandas’ read_csv() combines those columns into a single date column. And we can check the datatype of the new variable using Pandas’ info() function.

df.info()

We can see that it is an object of type “datetime[ns]”.


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    100 non-null    datetime64[ns]
dtypes: datetime64[ns](1) <===
memory usage: 928.0 bytes

Also note that while combining three columns into a single date column, the orginial three columns are not in the dataframe. If you we would like to keep the three columns after combining them into datetime, we need to use “keep_date_col=True”

df = pd.read_csv(path2data,
                 parse_dates= {"date" : ["year","month","day"]},
                keep_date_col=True)

Now we have four columns including the original Year, Month and Day column

df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    100 non-null    datetime64[ns]
 1   year    100 non-null    object        
 2   month   100 non-null    object        
 3   day     100 non-null    object        
dtypes: datetime64[ns](1), object(3)
memory usage: 3.2+ KB

Combining Month, Year and Day columns with Pandas apply()

Another approach to combine multiple columns into a single date column first by pasting the three columns using apply() function.

In this example, we specify the columns of interest. In our sample dataframe, it is all the columns. And then use lambda function to combine the three values in a row using join() function.

cols=["year","month","day"]
df['date'] = df[cols].apply(lambda x: '-'.join(x.values.astype(str)), axis="columns")

We now have a new column for date.

df.head()

	year	month	day	date
0	2012	1	1	2012-1-1
1	2012	1	2	2012-1-2
2	2012	1	3	2012-1-3
3	2012	1	4	2012-1-4
4	2012	1	5	2012-1-5

We can use info() function to see that the new variable is not a datetime object yet.

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   year    100 non-null    int64 
 1   month   100 non-null    int64 
 2   day     100 non-null    int64 
 3   date    100 non-null    object
dtypes: int64(3), object(1)
memory usage: 3.2+ KB

We can convert the date column into datetime type using Pandas to_datetime() function as shown in the post.

df['date']=pd.to_datetime(df['date'])