Pandas for Productivity Vlog Ep 4: How to identify and fix missing data

Why this topic? This is a lead-in to next week’s episode, where I will talk about calculating moving averages in Pandas. Subsequently, I will show how nulls in the data set will affect how you want to structure your code. Therefore, I need to get everyone on the same page in understanding how nulls work. […]

Read and write files with Jupyter Notebooks

Now that you’ve set up your Jupyter notebook, you can start getting data into it. To this purpose, this post discusses how to read and write files into and out of your Jupyter Notebooks. Furthermore, it tells you about the Python libraries you need for analyzing data. First things first: Essential Python libraries Your Jupyter […]

Take ownership of your data: The end-goal

We’ve come to the end of the Survival SQL series. Hopefully, you think the posts are relevant and easy to understand. Above all, I hope that your newfound knowledge has empowered you to take greater ownership of the data in your company. What does ownership mean? Your company probably has dedicated teams of data engineers […]

Subqueries and CTEs: Multi-step problems

As you progress, you’ll find situations where you need to combine more than one data pull to answer your question. In these situations, you’ll use subqueries and join or filter the results from these. Actually, you already have seen some subqueries in the examples from the last 2 posts. This post aims to provide an […]

Window functions : Automate lengthy Excel tasks

“Find our 5 best-selling products by country.””What was the 7-day moving average revenue for every day last month?” Chances are, you’ve come across one of these business questions before. And previously, you’ve probably solved them either by creating a ticket for your analytics team, or by writing manual formulas in Excel. Instead of doing these […]

Define your own metrics: Operators & conditionals

As you become confident with pulling data, you’ll start wanting more control over the metrics you report on. Hence, you’ll want to define your own metrics. Therefore, you’ll want to learn how to use operators and conditional statements. This Kaggle notebook contains all the code you need to follow along with the examples. Run cells […]

Aggregating metrics: Using “GROUP BY”

Earlier, we’ve discussed that your database stores data at its most granular level. This means that every order creates one row in your orders table. Likewise, every click a customer makes creates one row in your clicks table. And every review you receive creates one row in your product reviews table. When this builds up […]

Filtering the data you want: The “WHERE” clause

Why is SQL necessary and useful? In essence, we have huge amounts of data and SQL enables us to extract what we need for specific business insights. So, filtering with the “WHERE” clause is a crucial part of using SQL. This Kaggle notebook contains all the code for following along with the examples. Run cells […]