gzip file format is one of the most common formats for compressing/decompressing files. gzip compression on text files greatly reduce the space used to store the text file. If you are working with a big data file, often the big text files is compressed with gzip or “gzipped” to save space. A naive way to […]
Python 3 Guide for Data Scientists
In case you missed it, there won’t be any support Python 2 by 2020. The last Python 2 update was for Python 2.7. So if you are interested in Data Science and learning Python, start with Python 3. If you already program with Python 2, it is time to migrate to Python 3. Alex Rogozhnikov, […]
How to Get Frequency Counts of a Column in Pandas Dataframe: Pandas Tutorial
Often while working with pandas dataframe you might have a column with categorical variables, string/characters, and you want to find the frequency counts of each unique elements present in the column. Pandas’ value_counts() easily let you get the frequency counts. Let us get started with an example from a real world data set. Load gapminder […]
Installing Python 3 from Python 2 with Anaconda
If you have already installed Anaconda 2.7 and finally decided to take a plunge into Python 3 and want to install Python 3. Congrats. You don’t have to start fresh. You can easily upgrade to Python 3 using Anaconda package manager by creating new environment for Python 3. Note that this virtual environment is completely […]
Slide Decks and Packages in Tweets from 2018 rstudio::conf
2018 RStudio conference, one of the interesting conferences for anyone interested in R and RStudio just ended over the weekend. In case you missed it, twitter was abuzz with interesting bytes from the conference, including cool new R package that was presented and slides of the talks. Here is a compilation of tweets containing slides, […]



