Pandas offer many ways to select rows from a dataframe. One of the commonly used approach to filter rows of a dataframe is to use the indexing in multiple ways. For example, one can use label based indexing with loc function. Introducing pandas query() function, Jake VanderPlas nicely explains, While these abstractions are efficient and […]
Pandas 0.25.0 is Here. What is New? Named aggregation, explode() and sparse dataframe
If you are like me, you might have missed that the fantastic Pandas team has released the new version Pandas 0.25.0. As one would expect, there are quite a few new things in Pandas 0.25.0. A couple of new enhancements are around pandas’ groupby aggregation. Here are a few new things that look really interesting. […]
How to Randomly Select Groups in R with dplyr?
Sampling, randomly sub-setting, your data is often extremely useful in many situations. If you are interested in randomly sampling without regard to the groups, we can use sample_n() function from dplyr. Sometimes you might want to sample one or multiple groups with all elements/rows within the selected group(s). However, sampling one or more groups with […]
Dimensionality Reduction with tSNE in Python
tSNE, short for t-Distributed Stochastic Neighbor Embedding is a dimensionality reduction technique that can be very useful for visualizing high-dimensional datasets. tSNE was developed by Laurens van der Maaten and Geoffrey Hinton. Unlike, PCA, one of the commonly used dimensionality reduction techniques, tSNE is non-linear and probabilistic technique. What this means tSNE can capture non-linaer […]
Sparse Matrix Slicing in Python: Rows & Columns with SciPy
Fully updated August 2025: This guide has been refreshed with the latest library versions and tested code examples. Efficiently Slicing Rows and Columns from Sparse Matrices in Python with SciPy When working with large-scale data in fields like machine learning or scientific computing, you’ll often encounter sparse matrices—matrices where the vast majority of elements are […]

