Pandas Archives - a long, random walk...

Pandas for Productivity Vlog Ep 5: Moving averages in Pandas

November 16, 2020November 23, 2020 Hwei Yi comment

Why this topic? This topic doesn’t have much documentation, yet is potentially very useful. Calculating moving averages in Excel is time-consuming. Furthermore, when you use a window function in SQL you must structure the query carefully. Why not do it in Pandas instead? What it covers: First, I demonstrate how to use .rolling() to get

Mid-Career Tech

Pandas for Productivity Vlog Ep2: Think like a data engineer!

October 17, 2020October 17, 2020 Hwei Yi comment

Why this topic? When we join two flat files, or a flat file to a SQL query output in Python, they probably come from 2 different sources. Therefore, we can’t assume that they’re engineered to be combined directly. What it covers: I walk through 2 examples of joining data from completely different sources. In both

Mid-Career Tech

Sort and rank data with pandas

September 28, 2020September 28, 2020 Hwei Yi comment

Businesses like to know what their top-performing products, markets, and segments are. Therefore, you’ll need to sort and rank data quite often. Usually, you would do this in Excel, but you can do it more quickly with pandas. Sort data within a DataFrame For example, in this post we split the superstore data into 2

Mid-Career Tech

Combine data sets by merging or concatenating

September 27, 2020September 27, 2020 Hwei Yi comment

Have you ever tied up your computer for hours with a vlookup? Or laboriously copy pasted rows of .csv data to the end of an Excel spreadsheet on a repeating report? If so, this post will save you more time than anything before! When you combine data using Python, you can improve your speed and

Mid-Career Tech

Grouping and aggregating data in pandas

September 26, 2020September 26, 2020 Hwei Yi comment

Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method

Mid-Career Tech

Handle missing values in your DataFrame

September 25, 2020September 25, 2020 Hwei Yi comment

When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing

Mid-Career Tech

How to filter a DataFrame: Focus on specific data

September 20, 2020September 21, 2020 Hwei Yi comment

In the past few posts, we’ve discussed how to read your data into pandas, and then manipulate it via calculations. At this point, we’ve come to the stage of deriving insights. To begin, this post discusses how to filter a DataFrame. Through filtering, you can focus on the part of your data that answers a

Mid-Career Tech

Working with date and time in Pandas

September 17, 2020September 17, 2020 Hwei Yi comment

When looking at business data, time is always an important factor. From the start, you will focus on pulling your metrics over a relevant (and likely to be recent) time period. Next, you will drill down into trends over time, and compare performance over similar past periods. As a result, you will spend considerable time

Mid-Career Tech

Rename columns in your DataFrame

September 16, 2020September 18, 2020 Hwei Yi comment

Humans and computers like their text set up in different ways. For humans, you want familiar terms with capital letters and spacing in the right places. However, computers don’t like spaces between words. Therefore, this post is about how to rename the columns and data in your DataFrames. With this, you can make your text

Mid-Career Tech

Explore data in your DataFrame

September 16, 2020September 16, 2020 Hwei Yi comment

Your Jupyter notebook has read the data from your files and / or SQL queries. Therefore, you now have a DataFrame each for what would have been your data sheets in Excel. At this point, it’s time to inspect and explore your data. Inspect what the first few rows of data look like Firstly, to

a long, random walk…

Tag: Pandas

Pandas for Productivity Vlog Ep 5: Moving averages in Pandas

Pandas for Productivity Vlog Ep2: Think like a data engineer!

Sort and rank data with pandas

Combine data sets by merging or concatenating

Grouping and aggregating data in pandas

Handle missing values in your DataFrame

How to filter a DataFrame: Focus on specific data

Working with date and time in Pandas

Rename columns in your DataFrame

Explore data in your DataFrame