20 Free Online Books to Learn R and Data Science

If you are interested in learning Data Science with R, but not interested in spending money on books, you are definitely in a good space. There are a number of fantastic books and resources available online for free from top most creators and scientists. Here are such 13 free (so far) online data science books […]

How to Create Ordered Dictionary in Python?

Dictionary in Python is one of the most useful core data structures in Python. Sometimes, you may want to create a dictionary and also maintain the order of items you inserted when you are iterating the keys. Python’s collections module has OrderedDict that lets you create a ordered dictionary. Let us see an example of […]

How to Make a R Package from Scratch using RStudio?

Creating your first R package from scratch can look really daunting at first. The modern toolkits like RStudio IDE and devtools R package make it a lot easier to get started and create a new R package. Recently came across the second edition of R Packages book by Hadley Wickham and Jenny Bryan and it […]

Pearson and Spearman Correlation in Python

Understanding relationship between two or more variables is at the core of many aspects of data analysis or statistical analysis. Correlation or correlation coefficient captures the association between two variables (in the simplest case),  numerically. One of the commonly used correlation measures is Pearson correlation coefficient. Another commonly used correlation measure is Spearman correlation coefficient. […]

How to Select Rows of Pandas Dataframe with Query function?

Pandas offer many ways to select rows from a dataframe. One of the commonly used approach to filter rows of a dataframe is to use the indexing in multiple ways. For example, one can use label based indexing with loc function. As Jake VanderPlas nicely explains, introducing query() function While these abstractions are efficient and […]

Pandas 0.25.0 is Here. What is New? Named aggregation, explode() and sparse dataframe

If you are like me, you might have missed that the fantastic Pandas team has released the new version Pandas 0.25.0. As one would expect, there are quite a few new things in Pandas 0.25.0. A couple of new enhancements are around pandas’ groupby aggregation. Here are a few new things that look really interesting. […]

How to Randomly Select Groups in R with dplyr?

Sampling, randomly sub-setting, your data is often extremely useful in many situations. If you are interested in randomly sampling without regard to the groups, we can use sample_n() function from dplyr. Sometimes you might want to sample one or multiple groups with all elements/rows within the selected group(s). However, sampling one or more groups with […]

Dimensionality Reduction with tSNE in Python

tSNE, short for t-Distributed Stochastic Neighbor Embedding is a dimensionality reduction technique that can be very useful for visualizing high-dimensional datasets. tSNE was developed by Laurens van der Maaten and Geoffrey Hinton. Unlike, PCA, one of the commonly used dimensionality reduction techniques, tSNE is non-linear and probabilistic technique. What this means tSNE can capture non-linaer […]

How To Slice Rows and Columns of Sparse Matrix in Python?

Sometimes, while working with large sparse matrices in Python, you might want to select certain rows of sparse matrix or certain columns of sparse matrix. As we saw earlier, there are many types of sparse matrices available in SciPy in Python. Each of the sparse matrix type is optimized for specific operations. We will see […]

9 Basic Linear Algebra Operations with NumPy

Linear algebra is one of the most important mathematical topics that is highly useful to do a good data science. Learning the basics of linear algebra adds a valuable tool set to your data science skill. Python’s NumPy has fast efficient functions for all standard linear albegra/matrix operations. Here we will see 9 important and […]

10 quick tips for effective dimensionality reduction

Dimensionality reduction techniques like PCA, SVD, tSNE, UMAP are fantastic toolset to perform exploratory data analysis and unsupervised learning with high dimensional data. It has become really easy to use many available dimensionality reduction techniques in both R and Python while doing data science. However, often it can be little bit challenging to interpret low […]