7 Tips to Read a CSV File as Pandas Data Frame

Pandas is one of the popular Python package for manipulating data frames. Pandas is built on top of NumPy and thus it makes data manipulation fast and easy. Here are 7 basic options to learn to load a CSV file in pandas as data frame.

Load pandas package

Let us first load the pandas package.

# load pandas 
import pandas as pd

How to load a CSV file in Pandas as Data Frame?

A csv file, a comma-separated values (CSV) file, storing numerical and text values in a text file. Each field of the csv file is separated by comma and that is why the name CSV file. The data in a csv file can be easily load in Python as a data frame with the function pd.read_csv in pandas.

# CSV file
csv_file = 'sample_data.csv'
# read cvs with pandas read_csv
df = pd.read_csv(csv_file)

How to Read a CSV file on the Web in Pandas?

In the previous example, the csv file was locally available in your computer, with the file name ‘sample_data.csv’. What if the csv file is not in your computer, but on the web.

We will use the gapminder data as example to show how to read file from a URL.

# link to gapminder data as csv on the web 
csv_url='https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv'
# pandas read csv from URL
gapminder = pd.read_csv(csv_url)
gapminder.head()

How to read CSV file in to pandas with out header info?

If the CSV file does not contain any header information, we can specify that there is no header by specifying header option to be None. Note that if you try to read a csv file with header information, but with ‘header=None‘ option, our data frame will contain the header information as the first row.

>gapminder = pd.read_csv(csv_url, header=None)
>gapminder.head()

How to skip rows while loading CSV file?

Often a CSV file may contain other information not in tabular form in the initial few lines of the file. To read the CSV file and load the data in the CSV file as a data frame correctly, we may often want to skip the initial lines. We skip any number of rows of the file while reading, with skiprows option. For example, to skip a single row
We can read a CSV file, by skipping

# pandas read_csv with skiprows option
>gapminder = pd.read_csv(csv_url, header=None, skiprows=1)
>gapminder.head()

How to specify column names while Loading CSV file in Pandas?

If you want to rename (or name) the column names of the csv file, we can easily specifiy the names with the argument names while reading the csv file. For example, if we want to change the column names of the gapminder data, we will do it as follows.

# specify column names
>new_names = ['country','year', 'population', 'continent', 'life_expectancy', 'gdp_per_cap']
>gapminder = pd.read_csv(csv_url,skiprows=1,names=new_names)
>gapminder.head()

How to load a specific number of lines from a CSV file in pandas ?

If you are interested in load only a specific number of lines from the csv file, we can specify the number of lines to read with nrows argument. For example, to just read the 20 lines,

>gapminder = pd.read_csv(csv_url, nrows=20)
>print(gapminder.shape)
>print(gapminder.head())

How to read a tab separated file (tsv file) in pandas?

The pandas function name “read_csv” is bit of a misnomer. Although we used it to read/load a csv file, Comma Separated Value file, the function read_csv can read files separated by anything. For example, if the file is separated by tabs, “\t”, we can specify a new argument sep = ‘\t’.

# TSV file
tsv_file = 'sample_data.tsv'
# use pd.read_csv to load the tsv_file
df = pd.read_csv(tsv_file,sep="\t")