Pandas 1.x Cookbook Review

The Pandas 1.x Cookbook, by Matt Harrison and Theodore Petrou, is now in its 2nd edition, published this February. Previously, the first edition was from 2017 and covered an older version of pandas (0.26).

Pandas 1.x Cookbook (published Feb 2020).
Image credits:

How does the Pandas 1.x Cookbook meet the needs of a business analysis user?

Similarly to many “cookbook” style IT books, the target audience of this book is fairly broad. In fact, people in data science, scientific programming or software development may use this book as well. Therefore, the content covers a wide range of difficulty levels. Nonetheless, many chapters are very helpful for the business uses that we discussed in Pandas for Productivity. Specifically, these chapters are the most relevant for business users who are new to Python:

  • Pandas Foundations (Chapter 1): Shows you how to create a Series (data column or row) in different ways. Walks through the different attributes of a Series.
  • Essential DataFrame Operations (Chapter 2): Extends the material from Chapter 1 to DataFrames (data tables).
  • Filtering Rows (Chapter 7): Demonstrates 3 different ways of filtering a DataFrame, namely Boolean arrays, the .query method, and the .where method.
  • Grouping for Aggregation, Filtration, and Transformation (Chapter 9): Concentrate on the first half of the chapter to start; goes through many different syntaxes for the .groupby method, and how to adjust the column names of the output DataFrame.

After you get used to reading DataFrames, calculating new columns, filtering, grouping, and sorting, these chapters might help you solve more difficult problems:

  • Time Series Analysis (Chapter 12): Specifically, focus on .resample as a quicker way to group data by time periods.
  • Visualization with Matplotlib, Pandas, and Seaborn (Chapter 13): Gives you a quick overview of chart creation in Python, just in case you have too much data to chart in Excel.
  • Restructuring Data into a Tidy Form (Chapter 10): Skim through this if you ever find yourself writing a loop to rearrange your data from columns into rows. There might be faster ways to change the shape of your data for easier analysis!

At which point in your learning process should you read this book?

You’ll probably get the most out of the book if you read it after you’ve tried several analysis projects on your own. In that way, you would have a basic idea of how to use DataFrames, Series and key pandas functions and methods.

Due to the “cookbook” format of this book, it can overload a beginner with too much info very quickly. Each chapter focuses on a specific area (i.e. sorting, indexing, grouping, etc) and covers multiple ways to solve a problem. Hence, you will benefit more if you’re already familiar with the basics and are looking to try out better or shorter ways to get a particular task done.

Furthermore, this book can take you to the next level by showing you how to combine many methods on one line of code. Such a process is called “method chaining”. It is part of writing “idiomatic” pandas code, which means that your code is as efficient and concise as possible.

Where to buy the book?

To make sure that my reviews are completely fair, I’ve decided not to use any affiliate links on this blog. Therefore, I will show multiple ways of buying the book, without recommending one over another.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.