Data Science with R and Python- A Round Up: January 2020

Here you go with the first post for the year on “Data Science with R and Python Round Up”. The new year resolution is that to continue the monthly round up. This roundup is an attempt to compile interesting news, Python, R blog posts on anything related to data, data science, ML and AI. Hopefully to help catch up on the interesting things that you re-tweeted or re-shared but never got around to read it.

  1. Tech blogs from tech companies are always interesting. Here is a post on using Empirical Bayes approaches to rank products from Wayfair Bayesian Product Ranking at Wayfair
  2. Just recently found out that the Computer Scientist/Machine Learning researcher Chip Huyen is working on ML book. As part of the book, Chip Huyen has written a a fantastic blogpost analyzing salary data from over 19k tech employees from CA. It is a must read. Check it out Analysis of compensation, level, and experience details of 19k tech workers
  3. Staying on the topic of tech salary, the fantastic Julia Silge is back with another fantastic analysis post on salary from StackOverflow survey data 2019. Another must read. Modeling Salary and Gender in the Tech Industry.
  4. It is new year and time for prediction of the future of AI. And top AI researchers have weighed in on where AI is going. Top minds in machine learning predict where AI is going in 2020
  5. Allen Downey, the author “Think” Series books, has a new online notebook Elements of Data Science.  It is an introductory data science book in Python for people with no programming experience. Check out the free book Elements of Data Science.
  6. Python Statistical Data Visualization Library Seaborn new version 0.10.0 is here. Check out here to update to the latest version.
  7. Google is making getting access to public data sets easy. Check out how you can search millions of datasets: Discovering millions of datasets on the web
  8. Google’s Cassie Kozyrkov has a new post: Can analysts and statisticians get along?
    Inside the subtle war between the data science professions
  9. Pandas version 1.0.0 is here. It is a big update with Pandas moving from 0.25.3 to 1.0.0. Check out what is going to be new in Pandas 1.0.0
  10. tidyr version 1.0.0 was out last fall with new functions like pivot_longer() and pivot_wider() to nmake it easy to reshape data. They kind of replace tidyr functions spread() and gather(). The new functions made it necessary to update the wonderful R for Data Science as well. Until now the book carried spread() and gather(). Not anymore, Garrett Grolemund, one of the authors of the book has updated the Chapter 12 on tidy data with pivot_longer() and pivot_wider() functions. Check out the new chapter here
  11. RStudio 2020 conference has just ended this week. Among the many interesting things, the biggest announcement was RStudio has become a Public Benefit Corporation.

    By becoming a PBC, we have codified our open-source mission into our charter, which means that our corporate decisions must both align with this mission, as well as balance the interests of community, customers, employees, and shareholders. As a PBC, RStudio will publish an annual report that describes the public benefit we have created, along with how we seek to provide public benefits in the future

    And here is the first annual report from RStudio.

1 comment

Comments are closed.