The Pandas 1.x Cookbook, by Matt Harrison and Theodore Petrou, is now in its 2nd edition, published this February. Previously, the first edition was from 2017 and covered an older version of pandas (0.26). How does the Pandas 1.x Cookbook meet the needs of a business analysis user? Similarly to many “cookbook” style IT books,
Month: September 2020
Sort and rank data with pandas
Businesses like to know what their top-performing products, markets, and segments are. Therefore, you’ll need to sort and rank data quite often. Usually, you would do this in Excel, but you can do it more quickly with pandas. Sort data within a DataFrame For example, in this post we split the superstore data into 2
Combine data sets by merging or concatenating
Have you ever tied up your computer for hours with a vlookup? Or laboriously copy pasted rows of .csv data to the end of an Excel spreadsheet on a repeating report? If so, this post will save you more time than anything before! When you combine data using Python, you can improve your speed and
Grouping and aggregating data in pandas
Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method
Handle missing values in your DataFrame
When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing
How to filter a DataFrame: Focus on specific data
In the past few posts, we’ve discussed how to read your data into pandas, and then manipulate it via calculations. At this point, we’ve come to the stage of deriving insights. To begin, this post discusses how to filter a DataFrame. Through filtering, you can focus on the part of your data that answers a
Loops and list comprehension concepts
My intention is to help you understand every line of code posted. As such, I need to explain the concepts of loops and list comprehension. Consequently, you will be able to follow and reproduce the function written in this post. Loops help you make changes to every item in a list OK, so I am
Working with date and time in Pandas
When looking at business data, time is always an important factor. From the start, you will focus on pulling your metrics over a relevant (and likely to be recent) time period. Next, you will drill down into trends over time, and compare performance over similar past periods. As a result, you will spend considerable time
Write your own functions: customize your metrics
If you use Excel heavily, you will have used their functions at some point. In short, a function is a “black box” of code with a useful purpose, that you need over and over again. Subsequently, anyone can call it and get a consistent result. In this post, you will learn how to write your
Rename columns in your DataFrame
Humans and computers like their text set up in different ways. For humans, you want familiar terms with capital letters and spacing in the right places. However, computers don’t like spaces between words. Therefore, this post is about how to rename the columns and data in your DataFrames. With this, you can make your text