• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / R / Skimr / Skimr: A R Package to Skim Summary Data Effortlessly

Skimr: A R Package to Skim Summary Data Effortlessly

March 6, 2018 by cmdlinetips

Quick Data Summary with Skimr
Quick Data Summary with Skimr
Exploring your data while dong analysis is extremely important. skimr, an R package, from rOpenSci is a great package that helps you get the summary statistics in a nice way, so you can quickly skim your data summary and understand it better.

If you have not heard of rOpenSci, it is a non-profit initiative founded in 2011 by Karthik Ram, Scott Chamberlain, and Carl Boettiger with the goal to make scientific data reproducible. rOpenSci’s unconference is pretty cool idea. Check out rOpenSci website to learn more.

One of the best parts about skimr is nice summary tiny histogram that it displays for all numerical variables in the data. The other great thing about skimr is it works smoothly with tidyverse pipeline at any stage.

How to Install Skimr?

You can install skimr from CRAN.

# install.packages("devtools")
devtools::install_github("ropenscilabs/skimr")

Let us load the packages we will be using.

library(gapminder)
library(dplyr)
library(skimr)

Let us make a small data frame using two vectors and use skimr to check the summary of the dataframe

>x = seq(1000)
>y = rnorm(1000,mean=3)
>df = data.frame(x,y)
>skim(df)

When you type skim(df), skimr will give you a quick summary of the dataframe. skimr will first tell you the basic info like the number of variables. And then it will give summary stats on different types of variables. In this case, we have just an integer variable and a numeric variable. And for the numerical variable, skimr gives you a beautiful histogram in the console in addition to the standard summary statistics. The histogram tells us that our x variable is uniformly distributed and the y variable has a peak in the middle and spread out equally on the sides.

Now that we have seen the gist of what skimr can do for use, let us look at a bit more real data set and see how skim can be extremely using in a data analysis pipeline.

Skimr to select columns, just like dplyr’s select verb

Skimr can work just like dplyr’s select verb and we can select columns and look at the summary data. For example

>skim(gapminder, lifeExp, gdpPercap) 
skimr to select column, just like dplyr
skimr to select column, just like dplyr

Customize summary functions in skimr

Another coolest thing that skimr can do is you can customize the summary functions the way you want. For example, if you think the summary function that skimr offers is too much and you just want custom summary functions, you can easily do that.

For example, for every numerical variable, if you just to min, max, and mean summary values, you can specify that in a list and instruct skimr to use these summary functions instead of default with “skim_with” command.

funs <- list(
  min = min,
  max = max,
  mean = mean
)
skim_with(numeric = funs, append = FALSE)

Skimr to group_by object

Any call to skimr after this custom function definition will have customized summary stats. For example, you can use skim in any stage of tidyverse pipeline and get only the custom summary function. Yes, any stage of tidyverse pipeline, for example skimr can work with grouped object after group_by() in a dplyr pipeline. Here is an example using custom summary functions to grouped object in dplyr pipeline.

gapminder %>% 
  filter(year==2007) %>%
  select(continent,lifeExp,gdpPercap)%>%
  group_by(continent) %>%
  skim()
Skimr in tidyverse pipeline with custom summary stats
Skimr in tidyverse pipeline with custom summary stats

Here we call skim to group_by() object after a series of data manipulation to the gapminder data. The result is a tidy skimr object with the summary statistics we defined. And any time, if you want the default summary statistics, you can restore the default option with

# Restore defaults
skim_with_defaults()

There are many more interesting use of skimr to fit your tidy analysis pipeline. So check out skimr from rOpenSci now 🙂

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailHow to Compute Summary Statistics Across Multiple Columns in R Default ThumbnailHow to Add Group-Level Summary Statistic as a New Column in Pandas? Default Thumbnaildplyr groupby() and summarize(): Group By One or More Variables Default Thumbnail6 Most Useful dplyr Commands to Manipulate a Data Frame in R

Filed Under: Skimr, Skimr R Package Tagged With: R Tips, rOpenSci's Skimr, Skimr, Skimr R Package

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version