Why this topic? This topic doesn’t have much documentation, yet is potentially very useful. Calculating moving averages in Excel is time-consuming. Furthermore, when you use a window function in SQL you must structure the query carefully. Why not do it in Pandas instead? What it covers: First, I demonstrate how to use .rolling() to get
Tag: Pandas
Pandas for Productivity Vlog Ep2: Think like a data engineer!
Why this topic? When we join two flat files, or a flat file to a SQL query output in Python, they probably come from 2 different sources. Therefore, we can’t assume that they’re engineered to be combined directly. What it covers: I walk through 2 examples of joining data from completely different sources. In both
Sort and rank data with pandas
Businesses like to know what their top-performing products, markets, and segments are. Therefore, you’ll need to sort and rank data quite often. Usually, you would do this in Excel, but you can do it more quickly with pandas. Sort data within a DataFrame For example, in this post we split the superstore data into 2
Combine data sets by merging or concatenating
Have you ever tied up your computer for hours with a vlookup? Or laboriously copy pasted rows of .csv data to the end of an Excel spreadsheet on a repeating report? If so, this post will save you more time than anything before! When you combine data using Python, you can improve your speed and
Grouping and aggregating data in pandas
Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method
Handle missing values in your DataFrame
When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing
How to filter a DataFrame: Focus on specific data
In the past few posts, we’ve discussed how to read your data into pandas, and then manipulate it via calculations. At this point, we’ve come to the stage of deriving insights. To begin, this post discusses how to filter a DataFrame. Through filtering, you can focus on the part of your data that answers a
Working with date and time in Pandas
When looking at business data, time is always an important factor. From the start, you will focus on pulling your metrics over a relevant (and likely to be recent) time period. Next, you will drill down into trends over time, and compare performance over similar past periods. As a result, you will spend considerable time
Rename columns in your DataFrame
Humans and computers like their text set up in different ways. For humans, you want familiar terms with capital letters and spacing in the right places. However, computers don’t like spaces between words. Therefore, this post is about how to rename the columns and data in your DataFrames. With this, you can make your text
Explore data in your DataFrame
Your Jupyter notebook has read the data from your files and / or SQL queries. Therefore, you now have a DataFrame each for what would have been your data sheets in Excel. At this point, it’s time to inspect and explore your data. Inspect what the first few rows of data look like Firstly, to