• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / itertools / dropwhile / 3 Ways to Read a File and Skip Initial Comments in Python

3 Ways to Read a File and Skip Initial Comments in Python

January 31, 2018 by cmdlinetips

Reading a text file line by line is one of the common activities you do while dealing with a big text file. Often, you are not interested in initial few lines and want to skip them and work with rest of the file. The initial few lines of the text file that you want to skip are typically comment or some meta data and starts with some special characters like “#”.

Here are 3 ways to read a text file line by line Python and skip initial comment lines. You don’t have to know how many lines you want to skip. The first approach is a naive approach using if statement and not efficient. The second approach to skip lines while reading a text file is efficient, but still bit clunky and a bit of a hack. However, the third approach, which uses itertools’ dropwhile to skip lines while reading a file line by line is efficient and elegant.

1. How to skip initial comment lines using if statement

A naive way to read a file and skip initial comment lines is to use “if” statement and check if each line starts with the comment character “#”. Python string has a nice method “startswith” to check if a string, in this case a line, starts with specific characters. For example, “#comment”.startswith(“#”) will return TRUE. If the line does not start with “#”, we execute the else block.

The problem with this approach to skip a few lines is that we check each line of the file and see if it starts with “#”, which can be prohibitively slow if the file is really big. So clearly it is not an efficient approach to read a file and skip comment lines.

# open a file using with statement
with open(filename,'r') as fh
     for curline in fh:
         # check if the current line
         # starts with "#"
         if curline.startswith("#"):
            ...
            ...
         else:
            ...
            ...


2. Read line by line and skip comment lines using while statement

A second approach to read a file and the first part of a file based on some conditions is to use while statement. The idea here is to read a file line by line with while statement and break the while statement the moment we see the first line without the comment symbol (or without the pattern of interest). Then we use a second while loop to read through the rest of the file line be line.


with open('my_file.txt') as fh:
    # Skip initial comments that starts with #
    while True:
        line = fh.readline()
        # break while statement if it is not a comment line
        # i.e. does not startwith #
        if not line.startswith('#'):
            break

    # Second while loop to process the rest of the file
    while line:
        print(line)
        ...
        ...

3. Read line by line and skip lines using itertools’ dropwhile statement

Python’s itertools module has a really neat function/iterator called dropwhile. dropwhile can operate on any thing iterable like the file handler and list with filtering condition. dropwhile will drop elements until the filtering condition is false.

Let us see a simple example of itertools’ dropwhile on a list.

>from itertools import dropwhile
>list(dropwhile(lambda x: x<5, [1,4,6,4,1]))
[6, 4, 1]

Here, the condition x<5 is true until the 3rd element (6), therefore dropwhile kept dropping the elements until then.

We can use the same idea to read a file line by line and skip initial comment lines. Let us first write a simple utility function that takes a line and returns true if it is a comment line i.e. starts with “#”.

def is_comment(s):
    """ function to check if a line
         starts with some character.
         Here # for comment
    """
    # return true if a line starts with #
    return s.startswith('#')

After that, we can open the file using with statement and loop through each line using dropwhile iterator such that we give the function name for filtering condition and the file handler as our iterable. This would skip all initial comment lines and the code block inside the for loop will only see the lines after dropwhile condition failed.

with open(filename,'r') as fh
     for curline in dropwhile(is_comment, fh):
         ...
         ...

If you have a dataframe (containing string and numerical data in tabular form) in csv/tsv format, not a text file, and want to read and skip initial lines, you can easily use pandas’ read_csv to do that. Check here for examples

  • 7 Tips to Read a CSV File as Pandas Data Frame

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default Thumbnail7 ways to read files with readr package in R Default Thumbnail3 Ways to Write Text to a File in Python Default Thumbnail3 Ways to Read A Text File Line by Line in Python How to read a numerical data/file in Python with numpy?

Filed Under: dropwhile, itertools, read a file and skip comment lines, Uncategorized Tagged With: dropwhile, itertools, read file and skip comment lines

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version