NumPy and Pandas are two most useful python toolkits for data analysis. Sometimes you might want to convert a 2d-array in numpy to a dataframe.
In this short tutorial, we will learn how to convert a numpy array into Pandas dataframe.
import pandas as pd import seaborn as sns
Let us first create some numpy array. We will use NumPy’s random module to create two-dimesional numpy array.
np_array = np.random.rand(10,3)
Here we have created two dimesional numpy array of shape 10 rows x 3 columns
np_array.shape (10, 3)
To convert a numpy array to a Pandas dataframe, we use Pandas’ DataFrame() function with the numpy array as argument.
# convert numpy array to Pandas dataframe pd.DataFrame(np_array)
We get a Pandas dataframe with default column names and index or row names. By default, Pandas DataFrame() function names the columns starting with index 0.
0 1 2 0 0.240193 0.390997 0.233373 1 0.562184 0.964387 0.146074 2 0.542980 0.498600 0.494699 3 0.764410 0.429342 0.450513 4 0.595966 0.805123 0.114175 5 0.062249 0.334657 0.185373 6 0.904895 0.534821 0.087906 7 0.425533 0.472328 0.929547 8 0.209767 0.853591 0.522343 9 0.234314 0.732298 0.010851
If you wanted specific column names while creating the dataframe, we can provide the column names as “column” argument to DataFrame() function.
# convert numpy array to Pandas dataframe with column names pd.DataFrame(np_array, columns=["c1","c2","c3"])
In this example, we provided a list of names for columns.
c1 c2 c3 0 0.240193 0.390997 0.233373 1 0.562184 0.964387 0.146074 2 0.542980 0.498600 0.494699 3 0.764410 0.429342 0.450513 4 0.595966 0.805123 0.114175 5 0.062249 0.334657 0.185373 6 0.904895 0.534821 0.087906 7 0.425533 0.472328 0.929547 8 0.209767 0.853591 0.522343 9 0.234314 0.732298 0.010851
You might also want to check out how to rename Pandas’ colnames using dictionary here. How to Rename Columns in Pandas?
Want to get better at using Pandas for data science-ing? Check out Byte Sized Pandas 101 tutorials.