Pandas for Productivity Vlog Ep 5: Moving averages in Pandas

Why this topic? This topic doesn’t have much documentation, yet is potentially very useful. Calculating moving averages in Excel is time-consuming. Furthermore, when you use a window function in SQL you must structure the query carefully. Why not do it in Pandas instead? What it covers: First, I demonstrate how to use .rolling() to get

Pandas for Productivity Vlog Ep2: Think like a data engineer!

Why this topic? When we join two flat files, or a flat file to a SQL query output in Python, they probably come from 2 different sources. Therefore, we can’t assume that they’re engineered to be combined directly. What it covers: I walk through 2 examples of joining data from completely different sources. In both

Grouping and aggregating data in pandas

Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method

Handle missing values in your DataFrame

When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing

How to filter a DataFrame: Focus on specific data

In the past few posts, we’ve discussed how to read your data into pandas, and then manipulate it via calculations. At this point, we’ve come to the stage of deriving insights. To begin, this post discusses how to filter a DataFrame. Through filtering, you can focus on the part of your data that answers a