3 Ways to Read a File and Skip Initial Comments in Python

Reading a text file line by line is one of the common activities you do while dealing with a big text file. Often, you are not interested in initial few lines and want to skip them and work with rest of the file. The initial few lines of the text file that you want to skip are typically comment or some meta data and starts with some special characters like “#”.

Here are 3 ways to read a text file line by line Python and skip initial comment lines. You don’t have to know how many lines you want to skip. The first approach is a naive approach using if statement and not efficient. The second approach to skip lines while reading a text file is efficient, but still bit clunky and a bit of a hack. However, the third approach, which uses itertools’ dropwhile to skip lines while reading a file line by line is efficient and elegant.

1. How to skip initial comment lines using if statement

A naive way to read a file and skip initial comment lines is to use “if” statement and check if each line starts with the comment character “#”. Python string has a nice method “startswith” to check if a string, in this case a line, starts with specific characters. For example, “#comment”.startswith(“#”) will return TRUE. If the line does not start with “#”, we execute the else block.

The problem with this approach to skip a few lines is that we check each line of the file and see if it starts with “#”, which can be prohibitively slow if the file is really big. So clearly it is not an efficient approach to read a file and skip comment lines.

# open a file using with statement
with open(filename,'r') as fh
     for curline in fh:
         # check if the current line
         # starts with "#"
         if curline.startswith("#"):
            ...
            ...
         else:
            ...
            ...


2. Read line by line and skip comment lines using while statement

A second approach to read a file and the first part of a file based on some conditions is to use while statement. The idea here is to read a file line by line with while statement and break the while statement the moment we see the first line without the comment symbol (or without the pattern of interest). Then we use a second while loop to read through the rest of the file line be line.


with open('my_file.txt') as fh:
    # Skip initial comments that starts with #
    while True:
        line = fh.readline()
        # break while statement if it is not a comment line
        # i.e. does not startwith #
        if not line.startswith('#'):
            break

    # Second while loop to process the rest of the file
    while line:
        print(line)
        ...
        ...

3. Read line by line and skip lines using itertools’ dropwhile statement

Python’s itertools module has a really neat function/iterator called dropwhile. dropwhile can operate on any thing iterable like the file handler and list with filtering condition. dropwhile will drop elements until the filtering condition is false.

Let us see a simple example of itertools’ dropwhile on a list.

>from itertools import dropwhile
>list(dropwhile(lambda x: x<5, [1,4,6,4,1]))
[6, 4, 1]

Here, the condition x<5 is true until the 3rd element (6), therefore dropwhile kept dropping the elements until then.

We can use the same idea to read a file line by line and skip initial comment lines. Let us first write a simple utility function that takes a line and returns true if it is a comment line i.e. starts with “#”.

def is_comment(s):
    """ function to check if a line
         starts with some character.
         Here # for comment
    """
    # return true if a line starts with #
    return s.startswith('#')

After that, we can open the file using with statement and loop through each line using dropwhile iterator such that we give the function name for filtering condition and the file handler as our iterable. This would skip all initial comment lines and the code block inside the for loop will only see the lines after dropwhile condition failed.

with open(filename,'r') as fh
     for curline in dropwhile(is_comment, fh):
         ...
         ...

If you have a dataframe (containing string and numerical data in tabular form) in csv/tsv format, not a text file, and want to read and skip initial lines, you can easily use pandas’ read_csv to do that. Check here for examples