13 Free Online Resources/Books to learn R and Data Science

If you are interested in learning Data Science with R, but not interested in spending money on books, you are definitely in a good space. There are a number of fantastic books and resources available online for free from top most creators and scientists. Here are such 13 free (so far) online data science books […]

How To Subset Pandas Dataframe Based on Values of a Column?

Often, you want to subset a pandas dataframe based on the values of a specific column. Essentially, we would like to filter rows based on the values of a variable’s value so that we keep all the columns, but keep only certain rows. Here is how to filter rows in pandas dataframe. Let us first […]

Introduction to Split-Apply-Combine with Pandas

In a classic paper published at 2011, Hadley Wickham asked What do we do when we analyze data? What are common actions and what are common mistakes? And then went ahead to spell it out one of the most common strategies, Split-Apply-Combine, that is used in common data analysis. Intuitively, while solving a big problem, […]

JupyterLab is Here: First Impressions

JupyterLab, the next-generation web-based user interface for Python and R from Project Jupyter. It is still a beta release, but stable for daily use. One of the cool features of JupyterLab is that it is the go to browser based app for classic Jupyter Notebook, file browser for your computer files, text editor and a […]

What Does *args and **kwargs Mean in Python?

If you are new Python and saw the use of *args and **kwargs as function arguments and wondered what those *-thingies are, you are not alone. Typically when you write functions, you will have specific number and types of arguments the function can take as input. However, the more Python code you write, you might […]

7 ways to read text files in R

There are multiple ways read text files in rectangular, like csv file, tsv file or text file with common delimitters. readr package, part of tidyverse, offers seven functions to load flat text files easily. How to load a text file with readr package? read_csv(): to read comma delimited files read_csv2(): to read semicolon separated files […]

How To Randomly Select Rows in Pandas?: Pandas Tutorial

Pandas’ sample function lets you randomly sample data from pandas data frame. Here are three ways of using Pandas’ sample to randomly select rows. Let us first load the data. How to get a random subset of data To randomly select rows from a pandas dataframe, we can use sample function from pandas. For example, […]

6 ways to Sort Pandas Dataframe: Pandas Tutorial

Often you want to sort Pandas data frame in a specific way. Typically, one may want to sort pandas data frame based on the values of one or more columns or sort based on the values of row index or row names of pandas dataframe. Pandas data frame has two useful functions sort_values(): to sort […]

How to Read a gzip File in Python?

gzip file format is one of the most common formats for compressing/decompressing files. gzip compression on text files greatly reduce the space used to store the text file. If you are working with a big data file, often the big text files is compressed with gzip or “gzipped” to save space. A naive way to […]

Python 3 Guide for Data Scientists

In case you missed it, there won’t be any support Python 2 by 2020. The last Python 2 update was for Python 2.7. So if you are interested in Data Science and learning Python, start with Python 3. If you already program with Python 2, it is time to migrate to Python 3. Alex Rogozhnikov, […]

How to Get Frequency Counts of a Column in Pandas Dataframe: Pandas Tutorial

Often while working with pandas dataframe you might have a column with categorical variables, string/characters, and you want to find the frequency counts of each unique elements present in the column. Pandas’ value_counts() easily let you get the frequency counts. Let us get started with an example from a real world data set. Load gapminder […]