In this post, we will see how to combine columns containing year, month, and day into a single column of datetime type. We can combine multiple columns into a single date column in multiple ways. First, we will see how can we combine year, month and day column into a column of type datetime, while reading the data using Pandas read_csv() function. Next we will combine year, month and day columns using Pandas’ apply() function.
First, let us load Pandas.
import pandas as pd
We will use sample data containing just three columns, year, month, and day. We will load the data directly from github page.
path2data = "https://raw.githubusercontent.com/cmdlinetips/data/master/combine_year_month_day_into_date_pandas.csv" df = pd.read_csv(path2data)
Our data looks like this.
df.head() year month day 0 2012 1 1 1 2012 1 2 2 2012 1 3 3 2012 1 4 4 2012 1 5
Combining Year, Month, and Day Columns into Datetime column while reading the file
One of the ways to combine 3 columns corresponding to Year, Month, and Day in a dataframe is to parse them as date variable while loading the file as Pandas dataframe.
While loading the file as Pandas’ data frame using read_csv() function we can specify the column names to be combined into datetime column. We will use “parse_dates” argument to read_csv() function and provide the year,month,and day columns as values for dictionary with new date variable as key.
df = pd.read_csv(path2data, parse_dates= {"date" : ["year","month","day"]})
Now Pandas’ read_csv() combines those columns into a single date column. And we can check the datatype of the new variable using Pandas’ info() function.
df.info()
We can see that it is an object of type “datetime[ns]”.
<class 'pandas.core.frame.DataFrame'> RangeIndex: 100 entries, 0 to 99 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 100 non-null datetime64[ns] dtypes: datetime64[ns](1) <=== memory usage: 928.0 bytes
Also note that while combining three columns into a single date column, the orginial three columns are not in the dataframe. If you we would like to keep the three columns after combining them into datetime, we need to use “keep_date_col=True”
df = pd.read_csv(path2data, parse_dates= {"date" : ["year","month","day"]}, keep_date_col=True)
Now we have four columns including the original Year, Month and Day column
df.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 100 entries, 0 to 99 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 100 non-null datetime64[ns] 1 year 100 non-null object 2 month 100 non-null object 3 day 100 non-null object dtypes: datetime64[ns](1), object(3) memory usage: 3.2+ KB
Combining Month, Year and Day columns with Pandas apply()
Another approach to combine multiple columns into a single date column first by pasting the three columns using apply() function.
In this example, we specify the columns of interest. In our sample dataframe, it is all the columns. And then use lambda function to combine the three values in a row using join() function.
cols=["year","month","day"] df['date'] = df[cols].apply(lambda x: '-'.join(x.values.astype(str)), axis="columns")
We now have a new column for date.
df.head() year month day date 0 2012 1 1 2012-1-1 1 2012 1 2 2012-1-2 2 2012 1 3 2012-1-3 3 2012 1 4 2012-1-4 4 2012 1 5 2012-1-5
We can use info() function to see that the new variable is not a datetime object yet.
df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 100 entries, 0 to 99 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 100 non-null int64 1 month 100 non-null int64 2 day 100 non-null int64 3 date 100 non-null object dtypes: int64(3), object(1) memory usage: 3.2+ KB
We can convert the date column into datetime type using Pandas to_datetime() function as shown in the post.
df['date']=pd.to_datetime(df['date'])