Rename columns in your DataFrame

Humans and computers like their text set up in different ways. For humans, you want familiar terms with capital letters and spacing in the right places. However, computers don’t like spaces between words. Therefore, this post is about how to rename the columns and data in your DataFrames. With this, you can make your text computer-friendly before crunching numbers in Python. Afterwards, you can adjust again to cater to human readers.

Be computer-friendly: Rename columns with underscores

First, we’ll start with the superstore data from the Tableau Certified Associate Exam guide. I used this source because it is publicly available, yet looks like actual business data. Originally, the column names in the file look like this:

Original column names before we rename for more efficient computer processing

Although these column names are human-friendly, your computer finds them confusing. Therefore, it won’t let you use some time-saving shortcuts. Hence, you’ll want to replace the spaces with underscores, like this:

Rename columns in a DataFrame by replacing columns with underscores.

Furthermore, when you start grouping and calculating with columns, you will save time by not having to worry about capital letters. So, you can set all the columns to lowercase as well. Let’s try again and add this step:

Syntax: dataframe_name.columns = dataframe_name.columns.str.split(” “).str.join(“_”).str.lower()

How does this save you time? Remember before when we needed to index the DataFrame using dataframe_name[‘column_name‘] to get a column? Now we can also get the column using dataframe_name.column_name :

As you progress with analyzing your data, you will use both ways of getting at columns. For initial information, the Technical References section at the bottom of this post explains why these two approaches are different, and when you use either of them. Subsequently as you work on more business questions with data, you’ll get used to toggling between the two approaches.

Rename columns in human-friendly ways

Now, you’ve completed your analysis and are ready to set up your data for creating charts. Therefore, you want to get rid of all the ugly underscores. In addition, you want to put capitals back into your column names. To achieve this, here’s one short cut to rename your data columns. However, note that you will only get the first letter in every column name in caps:

Rename columns - capitalize only first letter of column name

At this point, you might not be satisfied because the column names will look better if every word in them starts with a capital letter. For that, you’ll need to loop through each of the words in the column names and capitalize them individually. Nonetheless, this doesn’t have to be time-consuming – if you write a function to do this, you will only need to do it once. Here is a short way of writing the function:

If you’re short on time and just want to use the code, you can download my code here. Otherwise if you want to learn how to do this from scratch, refer to this post to understand how to write a function. Furthermore, to step through the logic and code of the function itself, go to this post.

Change specific column names with .rename

Let’s say you think “State Or Province” is too long, and want to change it to just “State”. To this purpose, you can use .rename() to change the names of specific columns, like this:

Rename DataFrame columns using .rename() method and dictionaries.
Syntax: dataframename.rename(columns = {‘oldcolname1‘:’newcolname1‘, ‘oldcolname2‘:’newcolname2‘}, inplace = True)

First, you set up a dictionary – the keyword argument “columns =” tells Python to change the columns in your DataFrame. Next, the dictionary matches each old column name with the new column name, using a semicolon. Finally, you enclose the entire dictionary with curly brackets. Although the “Technical References” section also covers this, you can find a video tutorial on Python dictionaries here. Furthermore, note that you only need to put the column names that you want to change in the dictionary! This is because pandas is smart enough to know that any column not mentioned in the dictionary stays the same.

Subsequently, you use the keyword argument inplace = True to tell Python to replace the old DataFrame with this new, modified one.

In this post, we’ve covered:

Things you would do in ExcelEquivalent task with Jupyter notebooks
Retype or reformat column names1. Use .str.split() and .str.join() methods to change between spaces and underscores.
2. Use lower() and .capitalize() to change the caps in the column names. Additionally, you can use .upper() to change all letters to capitals.
3. Write and call a function to go into each word within a column name and capitalize it.
4. Rename specific columns using .rename() with a dictionary.

Technical references

Conceptual nuts and bolts:
Questions about syntax from this postConceptual overviewWhere to dig deeper
Why do we put str. in front of .split(), .lower(), .capitalize, and .join() when we do these operations on DataFrame columns?The .columns attribute of a DataFrame is an Index object. We are applying Python string methods on an Index object.Start with the Pandas documentation here.
Furthermore, you can find a full list of Python string methods here.
What is the difference between dataframe.column and dataframe[‘column‘], and when should we use either of these?dataframe.column (Method 1) will access the column as an attribute of the DataFrame, whereas dataframe[‘column‘] (Method 2) pulls out the column as a Series. Use Method 2 when creating a new column, and Method 1 when referring to existing columns in a DataFrame.See Stack Overflow posts here and here.
How to write and call a function?Follow the linked tutorial, and remember these things:
1. Python keywords def and return.
2. What are the argument(s) and result(s) from your function.
3. Use indents! An indent is 4 spaces, or use your Jupyter notebook’s auto-indent.
Python For Everybody tutorials are here.
What are keyword arguments?A keyword argument tells the function or method what this argument is for. Therefore, the argument has to begin with a keyword and ‘=’, then the content of the argument itself.Python documentation is here.
Why is True capitalized and green in “inplace = True”?True and False are Boolean data, akin to answering a yes / no question.This page explains how Boolean logic works.
Info about specific Python data structures:
Data you encountered in this postName of data structureResource
When splitting up and capitalizing each word in the DataFrame columns, we created a list of the column names. ListPython For Everybody Chapter 8
Each column name initially is a string. We split up the names into words by reformatting the names into lists of strings (where each word is a string) with the .split() method.StringPython For Everybody Chapter 6
We used dictionaries to rename column names in a DataFrame, and items in a data column.DictionaryPython For Everybody Chapter 9

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.