Pandas for Productivity Vlog Ep 4: How to identify and fix missing data

Why this topic?

This is a lead-in to next week’s episode, where I will talk about calculating moving averages in Pandas. Subsequently, I will show how nulls in the data set will affect how you want to structure your code. Therefore, I need to get everyone on the same page in understanding how nulls work.

Furthermore, this topic on its own is relevant for SQL beginners. If you have heard from others that you need to know how to handle missing data and want a concrete example, then this tutorial is for you!

What it covers:

First, I will show a sample SQL query that I used to explore how to get country-level daily numbers from COVID-19 open data set on Google BigQuery. Next, I walk through the nulls in the output and make conclusions about what to do with each of the null values. Finally, I talk about the “coalesce” function in SQL and demonstrate how it fills in a substitute value for the nulls.

Watch it here:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.