Pandas for Productivity Vlog Ep3: Creating time series-based ranges in NumPy

Why this topic? There’s a lot of documentation on using Numpy’s arange function to create ranges of numbers. However, not much is said about creating date- and time- based ranges, which are much more useful for business. What it covers: First, I give a quick review of the NumPy arange function syntax. Then, I walk

Pandas for Productivity Vlog Ep2: Think like a data engineer!

Why this topic? When we join two flat files, or a flat file to a SQL query output in Python, they probably come from 2 different sources. Therefore, we can’t assume that they’re engineered to be combined directly. What it covers: I walk through 2 examples of joining data from completely different sources. In both

Pandas for Productivity vlog Ep1: Relabeling x-axis in Matplotlib

Welcome to the launch of the Pandas for Productivity vlog series! Here, I discuss the peskier data wrangling challenges that you may encounter. For my first episode, I show you how to relabel the x-axis on a stacked area time series chart in Matplotlib. Why this topic? Matplotlib has been inconsistent with plotting time-series data.

Plotting charts in Python vs. Excel: A Demo

We previously discussed here that plotting charts in Python is second priority for beginners. Indeed, Python’s matplotlib library is very useful for creating elegant charts from large data sets. However, you need to remember a lot of code to make it work well! Example: COVID-19 public data set We will use the COVID-19 Case Surveillance

Grouping and aggregating data in pandas

Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method

Handle missing values in your DataFrame

When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing

How to filter a DataFrame: Focus on specific data

In the past few posts, we’ve discussed how to read your data into pandas, and then manipulate it via calculations. At this point, we’ve come to the stage of deriving insights. To begin, this post discusses how to filter a DataFrame. Through filtering, you can focus on the part of your data that answers a