Why this topic? There’s a lot of documentation on using Numpy’s arange function to create ranges of numbers. However, not much is said about creating date- and time- based ranges, which are much more useful for business. What it covers: First, I give a quick review of the NumPy arange function syntax. Then, I walk
Category: Mid-Career Tech
Resources to help mid-career professionals upskill to keep up with data science, without a programming background. Learn SQL and Python for free, at the speed of business, while remaining on your job.
Pandas for Productivity Vlog Ep2: Think like a data engineer!
Why this topic? When we join two flat files, or a flat file to a SQL query output in Python, they probably come from 2 different sources. Therefore, we can’t assume that they’re engineered to be combined directly. What it covers: I walk through 2 examples of joining data from completely different sources. In both
Pandas for Productivity vlog Ep1: Relabeling x-axis in Matplotlib
Welcome to the launch of the Pandas for Productivity vlog series! Here, I discuss the peskier data wrangling challenges that you may encounter. For my first episode, I show you how to relabel the x-axis on a stacked area time series chart in Matplotlib. Why this topic? Matplotlib has been inconsistent with plotting time-series data.
Plotting charts in Python vs. Excel: A Demo
We previously discussed here that plotting charts in Python is second priority for beginners. Indeed, Python’s matplotlib library is very useful for creating elegant charts from large data sets. However, you need to remember a lot of code to make it work well! Example: COVID-19 public data set We will use the COVID-19 Case Surveillance
Pandas 1.x Cookbook Review
The Pandas 1.x Cookbook, by Matt Harrison and Theodore Petrou, is now in its 2nd edition, published this February. Previously, the first edition was from 2017 and covered an older version of pandas (0.26). How does the Pandas 1.x Cookbook meet the needs of a business analysis user? Similarly to many “cookbook” style IT books,
Sort and rank data with pandas
Businesses like to know what their top-performing products, markets, and segments are. Therefore, you’ll need to sort and rank data quite often. Usually, you would do this in Excel, but you can do it more quickly with pandas. Sort data within a DataFrame For example, in this post we split the superstore data into 2
Combine data sets by merging or concatenating
Have you ever tied up your computer for hours with a vlookup? Or laboriously copy pasted rows of .csv data to the end of an Excel spreadsheet on a repeating report? If so, this post will save you more time than anything before! When you combine data using Python, you can improve your speed and
Grouping and aggregating data in pandas
Most business questions involve grouping and aggregating data. For example, we might want to examine sales by product category. Or else, we might be looking at profit margins by customer. Even when charting business performance over time, we are grouping our data by month or quarter. Therefore, you’ll find that .groupby() may be the method
Handle missing values in your DataFrame
When will you get missing values in your data? Quite often, actually! For instance, customer survey data where respondents did not answer every question. Or else, your company has some products that nobody bought. In order to deal with these situations, this post shows you how to handle missing values in your DataFrame. Find missing
How to filter a DataFrame: Focus on specific data
In the past few posts, we’ve discussed how to read your data into pandas, and then manipulate it via calculations. At this point, we’ve come to the stage of deriving insights. To begin, this post discusses how to filter a DataFrame. Through filtering, you can focus on the part of your data that answers a