How to profile your python code to improve performance
When your python scripts take too much time to execute, it’s crucial to profile which part of your code is responsible. It becomes increasingly tricky as your code grows both in size and complexity.
Let’s take an example:
run loads the Titanic csv file with pandas and does some computation on the loaded data. I want to know which part takes more time: the loading or the computation.
I tried the standard python profiler but it turned out that it does not help me much to address the performance issue. Instead, I wrote a python decorator to time my code by allowing me to profile exactly what I need.
Why the standard python profiler is not what I need?
Let’s profile our piece of python code with cProfile. To do so, load the cProfile module when you run your script:
python -m cProfile load_and_normalize.py
Alternatively, you can profile the “run” function directly in the python code:
In the end, you’ll get an output that looks like this:
Why it doesn’t help me:
- it gives way too much information: it tracks the time of all called functions, including the one called deep in pandas. Hence I got more ~300 function calls. In my example, I am only interested in the execution time of my own functions i.e.,
- “tottime” is not relevant: this is the actual time spent in the function excluding the time spent in subcalls. As all calls are tracked, it is almost always 0.000.
- exploring the profiler results is hard: time results are either print to the console or written in a file. You may use pstats or tools such as cProfilev(basic display with sorting facilities), kcachegring or gprof2dot (show call graph) to explore results, but it is heavy.
- exporting the result is not easy: in my case, I’d like to store the profiler results in a comprehensive way in an external database.
A simple python decorator for tracking time
Instead, I’ll use this simple python decorator for profiling:
It computes the execution time of the decorated function
fn and logs the result right after
fn finished to execute:
@wrap(line 6) decorator sets the meta-attributes (name, docstring, …) of the decorated function from the original function ones.
try … finallyensures the execution time is always logged, even if the decorated function raised an exception.
To use it, simply decorate the functions that you decided to profile:
Executing the function
run will print the following:
[SimpleTimeTracker] load 0.006
[SimpleTimeTracker] compute_max_ticket_price_per_class 0.003
[SimpleTimeTracker] normalize_ticket_price_per_class 0.004
[SimpleTimeTracker] run 0.010
- it is simple: the decorator is 24 lines long and has no dependencies.
- it is surgical: I only got timings of the function I decided to profile. This allows you to quickly understand which part of your script is slower.
- it is easy to export the results: the logging function
_logpassed to the decorator handles the profiler results. In the example above, it simply prints timings to the console, but I could have store the profiler results to a database.
A better python decorator that profiles partial times
The partial time of a function
foo consists of its total execution time i.e., the time delta between entering and exiting
foo, minus the execution times of the functions that are called inside the body of
In the example above, the partial execution time of the function
run is zero: its total execution time (10ms) minus execution times of
normalize_ticket_price (4ms) and
load (6ms). It means that if I want to improve the performance of the function
run, I may focus on improving
normalize_ticket_price, as literally no time is spent in the body of the function
run (which is obvious when looking at the code).
To profile partial execution times, I improved a little bit the decorator:
Note that we do need a global variable
_total_time_call_stack to compute partial execution times.
Here is the profiling results of when running the function
[BetterTimeTracker] load 0.006 0.006
[BetterTimeTracker] compute_max_ticket_price_per_class 0.003 0.003
[BetterTimeTracker] normalize_ticket_price_per_class 0.004 0.001
[BetterTimeTracker] run 0.010 0.000
Full code, documentation and examples are available here.
Following this comment on Reddit, I tried pyflame + flamegraph tools for profiling. It’s amazing: it shows which part of the python code takes more time in a concise way and without modifying the code (e.g., adding decorators).
To do so, you need to install pyflame and flamegraph on your computer, and to run the following command:
pyflame -t python your_code.py | flamegraph > profile.svg
You’ll get an SVG file that looks like this:
You can download the SVG file here. In a nutshell:
- Each bar represents a call to a function;
- Colors are randomly picked among warm colors :)
- The Y-axis represent call stacks;
- The X-axis is not the time: pyflame checks every milisecond the current call stack and the bar width represents the overall number of ticks it was in the function.
In this example, you can see most of the time (91.17%) is spent loading pandas. To get more detail about the
run function, do ctrl+f to search the SVG and click on the matching element (the tiny purple bar in the right part), you should get something like:
In the end, both pyflame and the profiling decorators allow you retrieve the execution time ratio spent in
normalize_ticket_price_per_class functions which is about 60/40.
Thanks to Reda Boumahdi, Adil Baaj, Tristan Roussel, Adrien Lina, Louis Nicolle, and Flavian Hautbois.