Pandas for Productivity Vlog Ep 5: Moving averages in Pandas

Why this topic? This topic doesn’t have much documentation, yet is potentially very useful. Calculating moving averages in Excel is time-consuming. Furthermore, when you use a window function in SQL you must structure the query carefully. Why not do it in Pandas instead? What it covers: First, I demonstrate how to use .rolling() to get […]

Pandas for Productivity Vlog Ep2: Think like a data engineer!

Why this topic? When we join two flat files, or a flat file to a SQL query output in Python, they probably come from 2 different sources. Therefore, we can’t assume that they’re engineered to be combined directly. What it covers: I walk through 2 examples of joining data from completely different sources. In both […]

Pandas for Productivity vlog Ep1: Relabeling x-axis in Matplotlib

Welcome to the launch of the Pandas for Productivity vlog series! Here, I discuss the peskier data wrangling challenges that you may encounter. For my first episode, I show you how to relabel the x-axis on a stacked area time series chart in Matplotlib. Why this topic? Matplotlib has been inconsistent with plotting time-series data. […]

Plotting charts in Python vs. Excel: A Demo

We previously discussed here that plotting charts in Python is second priority for beginners. Indeed, Python’s matplotlib library is very useful for creating elegant charts from large data sets. However, you need to remember a lot of code to make it work well! Example: COVID-19 public data set We will use the COVID-19 Case Surveillance […]

Grouping and aggregating data in pandas

Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method […]

Handle missing values in your DataFrame

When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing […]