Why this topic? This topic doesn’t have much documentation, yet is potentially very useful. Calculating moving averages in Excel is time-consuming. Furthermore, when you use a window function in SQL you must structure the query carefully. Why not do it in Pandas instead? What it covers: First, I demonstrate how to use .rolling() to get
Tag: Python
Pandas for Productivity Vlog Ep2: Think like a data engineer!
Why this topic? When we join two flat files, or a flat file to a SQL query output in Python, they probably come from 2 different sources. Therefore, we can’t assume that they’re engineered to be combined directly. What it covers: I walk through 2 examples of joining data from completely different sources. In both
Pandas for Productivity vlog Ep1: Relabeling x-axis in Matplotlib
Welcome to the launch of the Pandas for Productivity vlog series! Here, I discuss the peskier data wrangling challenges that you may encounter. For my first episode, I show you how to relabel the x-axis on a stacked area time series chart in Matplotlib. Why this topic? Matplotlib has been inconsistent with plotting time-series data.
Plotting charts in Python vs. Excel: A Demo
We previously discussed here that plotting charts in Python is second priority for beginners. Indeed, Python’s matplotlib library is very useful for creating elegant charts from large data sets. However, you need to remember a lot of code to make it work well! Example: COVID-19 public data set We will use the COVID-19 Case Surveillance
Pandas 1.x Cookbook Review
The Pandas 1.x Cookbook, by Matt Harrison and Theodore Petrou, is now in its 2nd edition, published this February. Previously, the first edition was from 2017 and covered an older version of pandas (0.26). How does the Pandas 1.x Cookbook meet the needs of a business analysis user? Similarly to many “cookbook” style IT books,
Sort and rank data with pandas
Businesses like to know what their top-performing products, markets, and segments are. Therefore, you’ll need to sort and rank data quite often. Usually, you would do this in Excel, but you can do it more quickly with pandas. Sort data within a DataFrame For example, in this post we split the superstore data into 2
Combine data sets by merging or concatenating
Have you ever tied up your computer for hours with a vlookup? Or laboriously copy pasted rows of .csv data to the end of an Excel spreadsheet on a repeating report? If so, this post will save you more time than anything before! When you combine data using Python, you can improve your speed and
Grouping and aggregating data in pandas
Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method
Handle missing values in your DataFrame
When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing
Loops and list comprehension concepts
My intention is to help you understand every line of code posted. As such, I need to explain the concepts of loops and list comprehension. Consequently, you will be able to follow and reproduce the function written in this post. Loops help you make changes to every item in a list OK, so I am