There are multiple ways read text files in rectangular, like csv file, tsv file or text file with common delimitters. readr package, part of tidyverse, offers seven functions to load flat text files easily. How to load a text file with readr package? read_csv(): to read comma delimited files read_csv2(): to read semicolon separated files […]
How To Randomly Select Rows in Pandas?
Creaating unbiased training and testing data sets are key for all Machine Learning tasks. Pandas’ sample function lets you randomly sample data from Pandas data frame and help with creating unbiased sampled datasets. It is a great way to get downsampled data frame and work with it. In this post, we will learn three ways […]
6 ways to Sort Pandas Dataframe: Pandas Tutorial
Often you want to sort Pandas data frame in a specific way. Typically, one may want to sort pandas data frame based on the values of one or more columns or sort based on the values of row index or row names of pandas dataframe. Pandas data frame has two useful functions sort_values(): to sort […]
How to Read a gzip File in Python?
gzip file format is one of the most common formats for compressing/decompressing files. gzip compression on text files greatly reduce the space used to store the text file. If you are working with a big data file, often the big text files is compressed with gzip or “gzipped” to save space. A naive way to […]
Python 3 Guide for Data Scientists
In case you missed it, there won’t be any support Python 2 by 2020. The last Python 2 update was for Python 2.7. So if you are interested in Data Science and learning Python, start with Python 3. If you already program with Python 2, it is time to migrate to Python 3. Alex Rogozhnikov, […]