If you use Excel heavily, you will have used their functions at some point. In short, a function is a “black box” of code with a useful purpose, that you need over and over again. Subsequently, anyone can call it and get a consistent result. In this post, you will learn how to write your own Python functions.
Syntax to write your own Python functions
As a start, we’ll break down the function from the previous post on renaming columns into its structure below. Additionally, you can get step-by-step explanations from Python For Everybody here.
When will you need to write functions to crunch business data?
Just like in Excel, you can do most arithmetic calculations among columns of a DataFrame with operators. In this case, you need to thank NumPy because its underpinnings allow you to match the values in the columns to calculate, and applies the calculation to all rows in the column. So, you’ll need to write functions only if you need multiple steps of “if-then-else” logic.
Example of using operators: Calculating profit margin
Let’s return to the superstore data set. As a start, we had converted all the column names to lower case with underscores. Next, we want to calculate the profit margin, which is profit / sales.
See what happened with the code in the red rectangle? You can create a new column and give it any name (just don’t repeat a name already in that DataFrame unless you want to replace the column). Next, you can use a mathematical expression involving other columns to tell Python how to calculate that new column. As a result, Python creates a new column (in this example, named “profit_margin”) and its values would be the result of performing the math on each row. In other words, the profit margin value in each row here is the value in the “profit” column of that row, divided by the value in the “sales” column of the same row. So far, it works just like Excel!
These are the common Python operators: +, –, *, /, ** (power), % (modulo i.e. the remainder of dividing one number by another, and // (floor i.e. the whole number quotient dividing one number by another).
Python built-in functions and libraries
In addition to the above operators, Python also has some built-in functions. These are functions that were written as part of the Python language. Many of them are more useful for developers, but the ones that may interest you immediately as an Excel user are abs(), max() and min(), which work similarly to how they would in Excel.
Further to the standard library (part of the Python language), we discussed libraries when we talked about importing numpy and pandas. Most other functions that you use in Excel, such as std(), median() etc, can be found in Numpy. Links to the official documentation are at the bottom of this post.
Solving an Excel pain point: if-then-else logic
While we’re on the topic of Excel – have you ever used the “IF” function in a spreadsheet? If you’ve done many of these, perhaps you might agree with my pain points:
- Everything is crammed tightly into one cell, so it’s hard for a third party to read.
- Furthermore, the logic is based on cell references, so a third party has to keep looking back and forth between cells to decipher the reasoning.
Therefore, using Python to apply if-then-else logic to business data helps not only your colleagues, but also the future you. Since your code clearly shows your columns and your logic, it will be easier to follow your thought process.
For instance, let’s suppose each order was supposed to ship within 2 days. So, we want to create a new column that says “late” if the shipping date minus the order date was more than 2 days, and “on time” otherwise. To achieve this, we can write a function with an “if” statement, such as this:
Syntax for “if” statement:
The Python For Everybody tutorial here provides a great foundation for understanding “if” statements. If you’ve watched this before but need a quick memory jog of the key points, remember these:
- Python keywords: use if for the first condition, elif for subsequent conditions, and else for the last one.
- You need to follow every condition and else with a colon.
- Indent all the code associated with each condition. This tells Python exactly which code to run when a particular condition is satisfied, and when to stop.
- Since you are writing a function here, your “if” statement has a single indent. Thus, the code stating the action to take for each condition will have a double-indent (see picture above). Jupyter will manage your indents for you, as long as you place your colons properly.
- Remember that Python goes through the conditions from top to bottom, so the order matters.
- Conditional operators for your “if” statements include == (equals), != (not equals), >, <, >= and <=.
Syntax for .apply: dataframe_name = dataframe_name.apply(function_name, axis = 1)
Unlike the code in the picture above, you want to save the DataFrame that you’ve applied your function to. So, you have to declare your DataFrame variable again (using =). You can use the same name, in which case you will overwrite your original DataFrame, or use a different name if you want to preserve both versions.
In this post, we’ve covered:
Things you would do in Excel | Equivalent task with Jupyter notebooks |
Create a new column in an Excel sheet via arithmetic operations on existing columns. | Create a new column in a DataFrame, using arithmetic operators on existing columns. Python arithmetic operators are: +, -, *, /, **, % and //. |
Use Excel “IF” function to create a new column. | Write a function with an “if” statement to create a new column in a DataFrame, then apply that function to the DataFrame. You can use operators and existing functions in the Python standard library or Numpy libraries to do math on existing columns. |
Technical resources
I believe that the above material is already quite technical, but if you want to be more exhaustive, you can refer to the links below.
Concepts | Where to dig deeper |
Full list of Python operators | The official list is here, but doesn’t explain the meaning of the symbols. In-depth explanations about a broad range of operators is here. For the last 2 operators, you need to go to Stack Overflow – @ is here, and := is here. |
Defining functions | Python official documentation is here. We covered the first part, but there’s more detail about default argument values and keyword arguments. |
Structure of “if” statement | Python official documentation is here, but doesn’t contain anything more detailed than we’ve discussed already. |
Full list of Python built-in functions | Built-in functions for the Python standard library are here. |
NumPy functions (not exhaustive, but the key ones you may need to replicate any functions you use in Excel). | Mathematical functions in NumPy are here. Statistical functions in NumPy are here. |
Why was .days necessary in the function for getting the difference between the ship and order dates? | This post covers concepts for working with date and time. Because the result of subtracting one date from another is a TimeDelta object, you need to extract the attribute .days from it to get a whole number of days that you can then compare with other numbers. |