10 Essential Python Profiling Tools to Boost Application Performance

As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!

Python performance monitoring and profiling are essential practices for maintaining efficient applications in production environments. As applications grow in complexity, identifying bottlenecks becomes increasingly challenging without proper tools. I’ve implemented numerous monitoring solutions across various projects and found that the right tooling can dramatically improve application performance.

Understanding Python Performance Profiling

Performance profiling involves measuring code execution time, memory usage, and resource consumption to identify inefficiencies. In Python, this process is particularly important due to the language’s dynamic nature and garbage collection mechanisms.

The primary metrics to monitor include CPU usage, memory consumption, execution time, and I/O operations. Before diving into specific tools, it’s important to understand what we’re measuring and why.

<span>def</span> <span>slow_function</span><span>():</span>
<span>result</span> <span>=</span> <span>0</span>
<span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>1000000</span><span>):</span>
<span>result</span> <span>+=</span> <span>i</span>
<span>return</span> <span>result</span>
<span># Without profiling, we can only guess why this is slow </span>
<span>def</span> <span>slow_function</span><span>():</span>
    <span>result</span> <span>=</span> <span>0</span>
    <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>1000000</span><span>):</span>
        <span>result</span> <span>+=</span> <span>i</span>
    <span>return</span> <span>result</span>

<span># Without profiling, we can only guess why this is slow </span>
def slow_function(): result = 0 for i in range(1000000): result += i return result # Without profiling, we can only guess why this is slow

Enter fullscreen mode Exit fullscreen mode

cProfile: The Built-in Solution

cProfile is Python’s built-in profiling tool and often serves as the starting point for performance analysis. It provides detailed statistics about function calls, including how many times each function is called and how much time is spent in each function.

<span>import</span> <span>cProfile</span>
<span>import</span> <span>pstats</span>
<span>from</span> <span>pstats</span> <span>import</span> <span>SortKey</span>
<span>def</span> <span>profile_code</span><span>():</span>
<span>cProfile</span><span>.</span><span>run</span><span>(</span><span>'</span><span>slow_function()</span><span>'</span><span>,</span> <span>'</span><span>stats.prof</span><span>'</span><span>)</span>
<span>p</span> <span>=</span> <span>pstats</span><span>.</span><span>Stats</span><span>(</span><span>'</span><span>stats.prof</span><span>'</span><span>)</span>
<span>p</span><span>.</span><span>strip_dirs</span><span>().</span><span>sort_stats</span><span>(</span><span>SortKey</span><span>.</span><span>CUMULATIVE</span><span>).</span><span>print_stats</span><span>(</span><span>10</span><span>)</span>
<span># This outputs call counts and timing information for the top 10 functions </span>
<span>import</span> <span>cProfile</span>
<span>import</span> <span>pstats</span>
<span>from</span> <span>pstats</span> <span>import</span> <span>SortKey</span>

<span>def</span> <span>profile_code</span><span>():</span>
    <span>cProfile</span><span>.</span><span>run</span><span>(</span><span>'</span><span>slow_function()</span><span>'</span><span>,</span> <span>'</span><span>stats.prof</span><span>'</span><span>)</span>
    <span>p</span> <span>=</span> <span>pstats</span><span>.</span><span>Stats</span><span>(</span><span>'</span><span>stats.prof</span><span>'</span><span>)</span>
    <span>p</span><span>.</span><span>strip_dirs</span><span>().</span><span>sort_stats</span><span>(</span><span>SortKey</span><span>.</span><span>CUMULATIVE</span><span>).</span><span>print_stats</span><span>(</span><span>10</span><span>)</span>

<span># This outputs call counts and timing information for the top 10 functions </span>
import cProfile import pstats from pstats import SortKey def profile_code(): cProfile.run('slow_function()', 'stats.prof') p = pstats.Stats('stats.prof') p.strip_dirs().sort_stats(SortKey.CUMULATIVE).print_stats(10) # This outputs call counts and timing information for the top 10 functions

Enter fullscreen mode Exit fullscreen mode

When I first started optimizing a data processing pipeline, cProfile helped me identify that a seemingly innocent string operation was being called millions of times. This discovery led to a simple optimization that reduced processing time by 30%.

py-spy: Low-overhead Sampling Profiler

While cProfile is comprehensive, it introduces significant overhead. For production environments, py-spy offers a better alternative. It works by sampling the Python call stack without modifying your code or significantly impacting performance.

<span># Install with: pip install py-spy # Then run from command line: # py-spy record -o profile.svg --pid 12345 </span>
<span># Or programmatically: </span><span>import</span> <span>subprocess</span>
<span>import</span> <span>os</span>
<span>def</span> <span>profile_running_application</span><span>(</span><span>pid</span><span>,</span> <span>duration</span><span>=</span><span>30</span><span>):</span>
<span>subprocess</span><span>.</span><span>call</span><span>([</span>
<span>"</span><span>py-spy</span><span>"</span><span>,</span> <span>"</span><span>record</span><span>"</span><span>,</span>
<span>"</span><span>-o</span><span>"</span><span>,</span> <span>"</span><span>profile.svg</span><span>"</span><span>,</span>
<span>"</span><span>--pid</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>pid</span><span>),</span>
<span>"</span><span>--duration</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>duration</span><span>)</span>
<span>])</span>
<span># Install with: pip install py-spy # Then run from command line: # py-spy record -o profile.svg --pid 12345 </span>
<span># Or programmatically: </span><span>import</span> <span>subprocess</span>
<span>import</span> <span>os</span>

<span>def</span> <span>profile_running_application</span><span>(</span><span>pid</span><span>,</span> <span>duration</span><span>=</span><span>30</span><span>):</span>
    <span>subprocess</span><span>.</span><span>call</span><span>([</span>
        <span>"</span><span>py-spy</span><span>"</span><span>,</span> <span>"</span><span>record</span><span>"</span><span>,</span>
        <span>"</span><span>-o</span><span>"</span><span>,</span> <span>"</span><span>profile.svg</span><span>"</span><span>,</span>
        <span>"</span><span>--pid</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>pid</span><span>),</span>
        <span>"</span><span>--duration</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>duration</span><span>)</span>
    <span>])</span>
# Install with: pip install py-spy # Then run from command line: # py-spy record -o profile.svg --pid 12345 # Or programmatically: import subprocess import os def profile_running_application(pid, duration=30): subprocess.call([ "py-spy", "record", "-o", "profile.svg", "--pid", str(pid), "--duration", str(duration) ])

Enter fullscreen mode Exit fullscreen mode

I once used py-spy to diagnose a production issue where an API was gradually slowing down. The generated flame graph immediately revealed that the database connection pool was exhausted, leading to connection wait times that weren’t visible in our regular metrics.

memray: Memory Profiling Made Simple

Memory leaks and excessive memory usage can cripple Python applications. memray is a powerful tool specifically designed for tracking memory usage in Python programs.

<span># Install with: pip install memray # Then run from command line: # python -m memray run my_script.py </span>
<span># For live tracking: </span><span>import</span> <span>memray</span>
<span>def</span> <span>memory_intensive_function</span><span>():</span>
<span>big_list</span> <span>=</span> <span>[</span><span>0</span><span>]</span> <span>*</span> <span>10000000</span>
<span># Do something with big_list </span> <span>return</span> <span>sum</span><span>(</span><span>big_list</span><span>)</span>
<span>with</span> <span>memray</span><span>.</span><span>Tracker</span><span>(</span><span>"</span><span>memory_profile.bin</span><span>"</span><span>):</span>
<span>memory_intensive_function</span><span>()</span>
<span># Later analyze with: # memray flamegraph memory_profile.bin </span>
<span># Install with: pip install memray # Then run from command line: # python -m memray run my_script.py </span>
<span># For live tracking: </span><span>import</span> <span>memray</span>

<span>def</span> <span>memory_intensive_function</span><span>():</span>
    <span>big_list</span> <span>=</span> <span>[</span><span>0</span><span>]</span> <span>*</span> <span>10000000</span>
    <span># Do something with big_list </span>    <span>return</span> <span>sum</span><span>(</span><span>big_list</span><span>)</span>

<span>with</span> <span>memray</span><span>.</span><span>Tracker</span><span>(</span><span>"</span><span>memory_profile.bin</span><span>"</span><span>):</span>
    <span>memory_intensive_function</span><span>()</span>

<span># Later analyze with: # memray flamegraph memory_profile.bin </span>
# Install with: pip install memray # Then run from command line: # python -m memray run my_script.py # For live tracking: import memray def memory_intensive_function(): big_list = [0] * 10000000 # Do something with big_list return sum(big_list) with memray.Tracker("memory_profile.bin"): memory_intensive_function() # Later analyze with: # memray flamegraph memory_profile.bin

Enter fullscreen mode Exit fullscreen mode

When debugging a machine learning application that was crashing with out-of-memory errors, memray helped me identify that intermediate results weren’t being garbage collected due to a circular reference. Fixing this reduced memory usage by 60%.

OpenTelemetry: Distributed Tracing

Modern applications often span multiple services. OpenTelemetry provides a framework for distributed tracing, which is essential for understanding performance across service boundaries.

<span>from</span> <span>opentelemetry</span> <span>import</span> <span>trace</span>
<span>from</span> <span>opentelemetry.sdk.trace</span> <span>import</span> <span>TracerProvider</span>
<span>from</span> <span>opentelemetry.sdk.trace.export</span> <span>import</span> <span>BatchSpanProcessor</span>
<span>from</span> <span>opentelemetry.exporter.otlp.proto.grpc.trace_exporter</span> <span>import</span> <span>OTLPSpanExporter</span>
<span># Setup </span><span>trace</span><span>.</span><span>set_tracer_provider</span><span>(</span><span>TracerProvider</span><span>())</span>
<span>tracer</span> <span>=</span> <span>trace</span><span>.</span><span>get_tracer</span><span>(</span><span>__name__</span><span>)</span>
<span>processor</span> <span>=</span> <span>BatchSpanProcessor</span><span>(</span><span>OTLPSpanExporter</span><span>(</span><span>endpoint</span><span>=</span><span>"</span><span>otel-collector:4317</span><span>"</span><span>))</span>
<span>trace</span><span>.</span><span>get_tracer_provider</span><span>().</span><span>add_span_processor</span><span>(</span><span>processor</span><span>)</span>
<span># Usage </span><span>@tracer.start_as_current_span</span><span>(</span><span>"</span><span>process_order</span><span>"</span><span>)</span>
<span>def</span> <span>process_order</span><span>(</span><span>order_id</span><span>):</span>
<span># Code here is automatically traced </span> <span>validate_order</span><span>(</span><span>order_id</span><span>)</span>
<span>update_inventory</span><span>(</span><span>order_id</span><span>)</span>
<span>@tracer.start_as_current_span</span><span>(</span><span>"</span><span>validate_order</span><span>"</span><span>)</span>
<span>def</span> <span>validate_order</span><span>(</span><span>order_id</span><span>):</span>
<span># This creates a child span </span> <span>pass</span>
<span>@tracer.start_as_current_span</span><span>(</span><span>"</span><span>update_inventory</span><span>"</span><span>)</span>
<span>def</span> <span>update_inventory</span><span>(</span><span>order_id</span><span>):</span>
<span># This creates another child span </span> <span>pass</span>
<span>from</span> <span>opentelemetry</span> <span>import</span> <span>trace</span>
<span>from</span> <span>opentelemetry.sdk.trace</span> <span>import</span> <span>TracerProvider</span>
<span>from</span> <span>opentelemetry.sdk.trace.export</span> <span>import</span> <span>BatchSpanProcessor</span>
<span>from</span> <span>opentelemetry.exporter.otlp.proto.grpc.trace_exporter</span> <span>import</span> <span>OTLPSpanExporter</span>

<span># Setup </span><span>trace</span><span>.</span><span>set_tracer_provider</span><span>(</span><span>TracerProvider</span><span>())</span>
<span>tracer</span> <span>=</span> <span>trace</span><span>.</span><span>get_tracer</span><span>(</span><span>__name__</span><span>)</span>
<span>processor</span> <span>=</span> <span>BatchSpanProcessor</span><span>(</span><span>OTLPSpanExporter</span><span>(</span><span>endpoint</span><span>=</span><span>"</span><span>otel-collector:4317</span><span>"</span><span>))</span>
<span>trace</span><span>.</span><span>get_tracer_provider</span><span>().</span><span>add_span_processor</span><span>(</span><span>processor</span><span>)</span>

<span># Usage </span><span>@tracer.start_as_current_span</span><span>(</span><span>"</span><span>process_order</span><span>"</span><span>)</span>
<span>def</span> <span>process_order</span><span>(</span><span>order_id</span><span>):</span>
    <span># Code here is automatically traced </span>    <span>validate_order</span><span>(</span><span>order_id</span><span>)</span>
    <span>update_inventory</span><span>(</span><span>order_id</span><span>)</span>

<span>@tracer.start_as_current_span</span><span>(</span><span>"</span><span>validate_order</span><span>"</span><span>)</span>
<span>def</span> <span>validate_order</span><span>(</span><span>order_id</span><span>):</span>
    <span># This creates a child span </span>    <span>pass</span>

<span>@tracer.start_as_current_span</span><span>(</span><span>"</span><span>update_inventory</span><span>"</span><span>)</span>
<span>def</span> <span>update_inventory</span><span>(</span><span>order_id</span><span>):</span>
    <span># This creates another child span </span>    <span>pass</span>
from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter # Setup trace.set_tracer_provider(TracerProvider()) tracer = trace.get_tracer(__name__) processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="otel-collector:4317")) trace.get_tracer_provider().add_span_processor(processor) # Usage @tracer.start_as_current_span("process_order") def process_order(order_id): # Code here is automatically traced validate_order(order_id) update_inventory(order_id) @tracer.start_as_current_span("validate_order") def validate_order(order_id): # This creates a child span pass @tracer.start_as_current_span("update_inventory") def update_inventory(order_id): # This creates another child span pass

Enter fullscreen mode Exit fullscreen mode

In a microservices architecture I worked on, implementing OpenTelemetry revealed that what we thought was a slow database query was actually latency introduced by a network hop between services. This insight completely changed our optimization approach.

Pyroscope: Continuous Profiling

Pyroscope enables continuous profiling, allowing you to track performance changes over time. This is crucial for identifying gradual degradations before they become critical issues.

<span># Install with: pip install pyroscope-io </span><span>import</span> <span>pyroscope</span>
<span>import</span> <span>time</span>
<span># Initialize profiler </span><span>pyroscope</span><span>.</span><span>configure</span><span>(</span>
<span>application_name</span><span>=</span><span>"</span><span>my_service</span><span>"</span><span>,</span>
<span>server_address</span><span>=</span><span>"</span><span>http://pyroscope-server:4040</span><span>"</span><span>,</span>
<span>tags</span><span>=</span><span>{</span><span>"</span><span>environment</span><span>"</span><span>:</span> <span>"</span><span>production</span><span>"</span><span>}</span>
<span>)</span>
<span># Automatically profile application </span><span>def</span> <span>main</span><span>():</span>
<span>while</span> <span>True</span><span>:</span>
<span># Your application code </span> <span>process_data</span><span>()</span>
<span>time</span><span>.</span><span>sleep</span><span>(</span><span>1</span><span>)</span>
<span>@pyroscope.tag</span><span>(</span><span>"</span><span>subsystem</span><span>"</span><span>,</span> <span>"</span><span>data_processor</span><span>"</span><span>)</span>
<span>def</span> <span>process_data</span><span>():</span>
<span># This function's performance will be tagged in Pyroscope </span> <span>data</span> <span>=</span> <span>[</span><span>i</span> <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>10000</span><span>)]</span>
<span>sorted_data</span> <span>=</span> <span>sorted</span><span>(</span><span>data</span><span>)</span>
<span>return</span> <span>sorted_data</span>
<span>if</span> <span>__name__</span> <span>==</span> <span>"</span><span>__main__</span><span>"</span><span>:</span>
<span>main</span><span>()</span>
<span># Install with: pip install pyroscope-io </span><span>import</span> <span>pyroscope</span>
<span>import</span> <span>time</span>

<span># Initialize profiler </span><span>pyroscope</span><span>.</span><span>configure</span><span>(</span>
    <span>application_name</span><span>=</span><span>"</span><span>my_service</span><span>"</span><span>,</span>
    <span>server_address</span><span>=</span><span>"</span><span>http://pyroscope-server:4040</span><span>"</span><span>,</span>
    <span>tags</span><span>=</span><span>{</span><span>"</span><span>environment</span><span>"</span><span>:</span> <span>"</span><span>production</span><span>"</span><span>}</span>
<span>)</span>

<span># Automatically profile application </span><span>def</span> <span>main</span><span>():</span>
    <span>while</span> <span>True</span><span>:</span>
        <span># Your application code </span>        <span>process_data</span><span>()</span>
        <span>time</span><span>.</span><span>sleep</span><span>(</span><span>1</span><span>)</span>

<span>@pyroscope.tag</span><span>(</span><span>"</span><span>subsystem</span><span>"</span><span>,</span> <span>"</span><span>data_processor</span><span>"</span><span>)</span>
<span>def</span> <span>process_data</span><span>():</span>
    <span># This function's performance will be tagged in Pyroscope </span>    <span>data</span> <span>=</span> <span>[</span><span>i</span> <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>10000</span><span>)]</span>
    <span>sorted_data</span> <span>=</span> <span>sorted</span><span>(</span><span>data</span><span>)</span>
    <span>return</span> <span>sorted_data</span>

<span>if</span> <span>__name__</span> <span>==</span> <span>"</span><span>__main__</span><span>"</span><span>:</span>
    <span>main</span><span>()</span>
# Install with: pip install pyroscope-io import pyroscope import time # Initialize profiler pyroscope.configure( application_name="my_service", server_address="http://pyroscope-server:4040", tags={"environment": "production"} ) # Automatically profile application def main(): while True: # Your application code process_data() time.sleep(1) @pyroscope.tag("subsystem", "data_processor") def process_data(): # This function's performance will be tagged in Pyroscope data = [i for i in range(10000)] sorted_data = sorted(data) return sorted_data if __name__ == "__main__": main()

Enter fullscreen mode Exit fullscreen mode

The ability to compare profiles over time with Pyroscope helped my team identify a performance regression introduced by a dependency upgrade. We were able to address it before users noticed any slowdown.

Prometheus: Metrics Collection

Prometheus has become the standard for collecting and alerting on application metrics. The Python client library makes it easy to expose custom metrics from your application.

<span>from</span> <span>prometheus_client</span> <span>import</span> <span>start_http_server</span><span>,</span> <span>Counter</span><span>,</span> <span>Histogram</span>
<span>import</span> <span>random</span>
<span>import</span> <span>time</span>
<span># Create metrics </span><span>REQUEST_COUNT</span> <span>=</span> <span>Counter</span><span>(</span><span>'</span><span>request_count</span><span>'</span><span>,</span> <span>'</span><span>Total request count</span><span>'</span><span>)</span>
<span>REQUEST_LATENCY</span> <span>=</span> <span>Histogram</span><span>(</span><span>'</span><span>request_latency_seconds</span><span>'</span><span>,</span> <span>'</span><span>Request latency in seconds</span><span>'</span><span>)</span>
<span># Start server to expose metrics </span><span>start_http_server</span><span>(</span><span>8000</span><span>)</span>
<span># Simulate request handling </span><span>def</span> <span>handle_request</span><span>():</span>
<span>REQUEST_COUNT</span><span>.</span><span>inc</span><span>()</span>
<span>with</span> <span>REQUEST_LATENCY</span><span>.</span><span>time</span><span>():</span>
<span># Simulate work </span> <span>time</span><span>.</span><span>sleep</span><span>(</span><span>random</span><span>.</span><span>random</span><span>())</span>
<span># Main loop </span><span>while</span> <span>True</span><span>:</span>
<span>handle_request</span><span>()</span>
<span>time</span><span>.</span><span>sleep</span><span>(</span><span>1</span><span>)</span>
<span>from</span> <span>prometheus_client</span> <span>import</span> <span>start_http_server</span><span>,</span> <span>Counter</span><span>,</span> <span>Histogram</span>
<span>import</span> <span>random</span>
<span>import</span> <span>time</span>

<span># Create metrics </span><span>REQUEST_COUNT</span> <span>=</span> <span>Counter</span><span>(</span><span>'</span><span>request_count</span><span>'</span><span>,</span> <span>'</span><span>Total request count</span><span>'</span><span>)</span>
<span>REQUEST_LATENCY</span> <span>=</span> <span>Histogram</span><span>(</span><span>'</span><span>request_latency_seconds</span><span>'</span><span>,</span> <span>'</span><span>Request latency in seconds</span><span>'</span><span>)</span>

<span># Start server to expose metrics </span><span>start_http_server</span><span>(</span><span>8000</span><span>)</span>

<span># Simulate request handling </span><span>def</span> <span>handle_request</span><span>():</span>
    <span>REQUEST_COUNT</span><span>.</span><span>inc</span><span>()</span>
    <span>with</span> <span>REQUEST_LATENCY</span><span>.</span><span>time</span><span>():</span>
        <span># Simulate work </span>        <span>time</span><span>.</span><span>sleep</span><span>(</span><span>random</span><span>.</span><span>random</span><span>())</span>

<span># Main loop </span><span>while</span> <span>True</span><span>:</span>
    <span>handle_request</span><span>()</span>
    <span>time</span><span>.</span><span>sleep</span><span>(</span><span>1</span><span>)</span>
from prometheus_client import start_http_server, Counter, Histogram import random import time # Create metrics REQUEST_COUNT = Counter('request_count', 'Total request count') REQUEST_LATENCY = Histogram('request_latency_seconds', 'Request latency in seconds') # Start server to expose metrics start_http_server(8000) # Simulate request handling def handle_request(): REQUEST_COUNT.inc() with REQUEST_LATENCY.time(): # Simulate work time.sleep(random.random()) # Main loop while True: handle_request() time.sleep(1)

Enter fullscreen mode Exit fullscreen mode

Implementing Prometheus metrics in a critical API service allowed us to set up alerts for SLA violations. This proactive approach reduced our mean time to detection for performance issues from hours to minutes.

Scalene: High-precision CPU and Memory Profiling

Scalene offers high-precision profiling that accurately accounts for time spent in CPU, memory operations, and I/O waiting. This provides a more complete picture of performance bottlenecks.

<span># Install with: pip install scalene # Run from command line: # python -m scalene your_program.py </span>
<span># For programmatic use: </span><span>import</span> <span>scalene.scalene_profiler</span>
<span>def</span> <span>main</span><span>():</span>
<span>scalene</span><span>.</span><span>scalene_profiler</span><span>.</span><span>start</span><span>()</span>
<span># Your code here </span> <span>compute_intensive_task</span><span>()</span>
<span>io_intensive_task</span><span>()</span>
<span>scalene</span><span>.</span><span>scalene_profiler</span><span>.</span><span>stop</span><span>()</span>
<span>def</span> <span>compute_intensive_task</span><span>():</span>
<span>result</span> <span>=</span> <span>0</span>
<span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>10000000</span><span>):</span>
<span>result</span> <span>+=</span> <span>i</span>
<span>return</span> <span>result</span>
<span>def</span> <span>io_intensive_task</span><span>():</span>
<span>with</span> <span>open</span><span>(</span><span>'</span><span>large_file.txt</span><span>'</span><span>,</span> <span>'</span><span>r</span><span>'</span><span>)</span> <span>as</span> <span>f</span><span>:</span>
<span>data</span> <span>=</span> <span>f</span><span>.</span><span>read</span><span>()</span>
<span>return</span> <span>len</span><span>(</span><span>data</span><span>)</span>
<span>if</span> <span>__name__</span> <span>==</span> <span>"</span><span>__main__</span><span>"</span><span>:</span>
<span>main</span><span>()</span>
<span># Install with: pip install scalene # Run from command line: # python -m scalene your_program.py </span>
<span># For programmatic use: </span><span>import</span> <span>scalene.scalene_profiler</span>

<span>def</span> <span>main</span><span>():</span>
    <span>scalene</span><span>.</span><span>scalene_profiler</span><span>.</span><span>start</span><span>()</span>

    <span># Your code here </span>    <span>compute_intensive_task</span><span>()</span>
    <span>io_intensive_task</span><span>()</span>

    <span>scalene</span><span>.</span><span>scalene_profiler</span><span>.</span><span>stop</span><span>()</span>

<span>def</span> <span>compute_intensive_task</span><span>():</span>
    <span>result</span> <span>=</span> <span>0</span>
    <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>10000000</span><span>):</span>
        <span>result</span> <span>+=</span> <span>i</span>
    <span>return</span> <span>result</span>

<span>def</span> <span>io_intensive_task</span><span>():</span>
    <span>with</span> <span>open</span><span>(</span><span>'</span><span>large_file.txt</span><span>'</span><span>,</span> <span>'</span><span>r</span><span>'</span><span>)</span> <span>as</span> <span>f</span><span>:</span>
        <span>data</span> <span>=</span> <span>f</span><span>.</span><span>read</span><span>()</span>
    <span>return</span> <span>len</span><span>(</span><span>data</span><span>)</span>

<span>if</span> <span>__name__</span> <span>==</span> <span>"</span><span>__main__</span><span>"</span><span>:</span>
    <span>main</span><span>()</span>
# Install with: pip install scalene # Run from command line: # python -m scalene your_program.py # For programmatic use: import scalene.scalene_profiler def main(): scalene.scalene_profiler.start() # Your code here compute_intensive_task() io_intensive_task() scalene.scalene_profiler.stop() def compute_intensive_task(): result = 0 for i in range(10000000): result += i return result def io_intensive_task(): with open('large_file.txt', 'r') as f: data = f.read() return len(data) if __name__ == "__main__": main()

Enter fullscreen mode Exit fullscreen mode

What sets Scalene apart is its ability to differentiate between CPU time and waiting time. In a data processing pipeline I optimized, Scalene revealed that what appeared to be a CPU bottleneck was actually time spent waiting for I/O operations. This insight led to implementing concurrent processing that improved throughput by 3x.

Flame Graphs: Visualizing Performance Data

Flame graphs provide an intuitive way to visualize profiling data. They make it easy to identify “hot” code paths that consume disproportionate resources.

<span># Using py-spy to generate a flame graph </span><span>import</span> <span>subprocess</span>
<span>def</span> <span>generate_flame_graph</span><span>(</span><span>pid</span><span>,</span> <span>output</span><span>=</span><span>"</span><span>flamegraph.svg</span><span>"</span><span>,</span> <span>duration</span><span>=</span><span>30</span><span>):</span>
<span>subprocess</span><span>.</span><span>call</span><span>([</span>
<span>"</span><span>py-spy</span><span>"</span><span>,</span> <span>"</span><span>record</span><span>"</span><span>,</span>
<span>"</span><span>--format</span><span>"</span><span>,</span> <span>"</span><span>flamegraph</span><span>"</span><span>,</span>
<span>"</span><span>-o</span><span>"</span><span>,</span> <span>output</span><span>,</span>
<span>"</span><span>--pid</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>pid</span><span>),</span>
<span>"</span><span>--duration</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>duration</span><span>)</span>
<span>])</span>
<span># Using Speedscope with cProfile </span><span>import</span> <span>cProfile</span>
<span>import</span> <span>subprocess</span>
<span>def</span> <span>profile_with_speedscope</span><span>(</span><span>func</span><span>,</span> <span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>):</span>
<span>profile_file</span> <span>=</span> <span>"</span><span>profile.prof</span><span>"</span>
<span>cProfile</span><span>.</span><span>runctx</span><span>(</span><span>"</span><span>result = func(*args, **kwargs)</span><span>"</span><span>,</span> <span>globals</span><span>(),</span> <span>locals</span><span>(),</span> <span>profile_file</span><span>)</span>
<span># Convert to speedscope format (requires pyspeedscope) </span> <span>subprocess</span><span>.</span><span>call</span><span>([</span><span>"</span><span>pyspeedscope</span><span>"</span><span>,</span> <span>profile_file</span><span>,</span> <span>"</span><span>-o</span><span>"</span><span>,</span> <span>"</span><span>profile.speedscope.json</span><span>"</span><span>])</span>
<span>return</span> <span>locals</span><span>()[</span><span>"</span><span>result</span><span>"</span><span>]</span>
<span># Using py-spy to generate a flame graph </span><span>import</span> <span>subprocess</span>

<span>def</span> <span>generate_flame_graph</span><span>(</span><span>pid</span><span>,</span> <span>output</span><span>=</span><span>"</span><span>flamegraph.svg</span><span>"</span><span>,</span> <span>duration</span><span>=</span><span>30</span><span>):</span>
    <span>subprocess</span><span>.</span><span>call</span><span>([</span>
        <span>"</span><span>py-spy</span><span>"</span><span>,</span> <span>"</span><span>record</span><span>"</span><span>,</span>
        <span>"</span><span>--format</span><span>"</span><span>,</span> <span>"</span><span>flamegraph</span><span>"</span><span>,</span>
        <span>"</span><span>-o</span><span>"</span><span>,</span> <span>output</span><span>,</span>
        <span>"</span><span>--pid</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>pid</span><span>),</span>
        <span>"</span><span>--duration</span><span>"</span><span>,</span> <span>str</span><span>(</span><span>duration</span><span>)</span>
    <span>])</span>

<span># Using Speedscope with cProfile </span><span>import</span> <span>cProfile</span>
<span>import</span> <span>subprocess</span>

<span>def</span> <span>profile_with_speedscope</span><span>(</span><span>func</span><span>,</span> <span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>):</span>
    <span>profile_file</span> <span>=</span> <span>"</span><span>profile.prof</span><span>"</span>
    <span>cProfile</span><span>.</span><span>runctx</span><span>(</span><span>"</span><span>result = func(*args, **kwargs)</span><span>"</span><span>,</span> <span>globals</span><span>(),</span> <span>locals</span><span>(),</span> <span>profile_file</span><span>)</span>
    <span># Convert to speedscope format (requires pyspeedscope) </span>    <span>subprocess</span><span>.</span><span>call</span><span>([</span><span>"</span><span>pyspeedscope</span><span>"</span><span>,</span> <span>profile_file</span><span>,</span> <span>"</span><span>-o</span><span>"</span><span>,</span> <span>"</span><span>profile.speedscope.json</span><span>"</span><span>])</span>
    <span>return</span> <span>locals</span><span>()[</span><span>"</span><span>result</span><span>"</span><span>]</span>
# Using py-spy to generate a flame graph import subprocess def generate_flame_graph(pid, output="flamegraph.svg", duration=30): subprocess.call([ "py-spy", "record", "--format", "flamegraph", "-o", output, "--pid", str(pid), "--duration", str(duration) ]) # Using Speedscope with cProfile import cProfile import subprocess def profile_with_speedscope(func, *args, **kwargs): profile_file = "profile.prof" cProfile.runctx("result = func(*args, **kwargs)", globals(), locals(), profile_file) # Convert to speedscope format (requires pyspeedscope) subprocess.call(["pyspeedscope", profile_file, "-o", "profile.speedscope.json"]) return locals()["result"]

Enter fullscreen mode Exit fullscreen mode

The first time I used flame graphs to analyze a Django application, I was surprised to find that template rendering was consuming more CPU than database queries. This visual representation made the bottleneck obvious in a way that raw numbers never could.

Implementing Profiling in Production

Implementing profiling in production requires careful consideration of overhead and security implications. Here’s a practical approach:

<span>import</span> <span>os</span>
<span>import</span> <span>cProfile</span>
<span>import</span> <span>random</span>
<span>class</span> <span>ConditionalProfiler</span><span>:</span>
<span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>sample_rate</span><span>=</span><span>0.01</span><span>,</span> <span>profile_dir</span><span>=</span><span>"</span><span>/tmp/profiles</span><span>"</span><span>):</span>
<span>self</span><span>.</span><span>sample_rate</span> <span>=</span> <span>sample_rate</span>
<span>self</span><span>.</span><span>profile_dir</span> <span>=</span> <span>profile_dir</span>
<span>os</span><span>.</span><span>makedirs</span><span>(</span><span>profile_dir</span><span>,</span> <span>exist_ok</span><span>=</span><span>True</span><span>)</span>
<span>def</span> <span>__call__</span><span>(</span><span>self</span><span>,</span> <span>func</span><span>):</span>
<span>def</span> <span>wrapped</span><span>(</span><span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>):</span>
<span># Only profile a small percentage of calls </span> <span>if</span> <span>random</span><span>.</span><span>random</span><span>()</span> <span><</span> <span>self</span><span>.</span><span>sample_rate</span><span>:</span>
<span>profile_path</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>self</span><span>.</span><span>profile_dir</span><span>}</span><span>/</span><span>{</span><span>func</span><span>.</span><span>__name__</span><span>}</span><span>_</span><span>{</span><span>os</span><span>.</span><span>getpid</span><span>()</span><span>}</span><span>_</span><span>{</span><span>int</span><span>(</span><span>time</span><span>.</span><span>time</span><span>())</span><span>}</span><span>.prof</span><span>"</span>
<span>profiler</span> <span>=</span> <span>cProfile</span><span>.</span><span>Profile</span><span>()</span>
<span>profiler</span><span>.</span><span>enable</span><span>()</span>
<span>try</span><span>:</span>
<span>result</span> <span>=</span> <span>func</span><span>(</span><span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>)</span>
<span>finally</span><span>:</span>
<span>profiler</span><span>.</span><span>disable</span><span>()</span>
<span>profiler</span><span>.</span><span>dump_stats</span><span>(</span><span>profile_path</span><span>)</span>
<span>return</span> <span>result</span>
<span>else</span><span>:</span>
<span>return</span> <span>func</span><span>(</span><span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>)</span>
<span>return</span> <span>wrapped</span>
<span># Usage </span><span>@ConditionalProfiler</span><span>(</span><span>sample_rate</span><span>=</span><span>0.05</span><span>)</span>
<span>def</span> <span>expensive_operation</span><span>(</span><span>data</span><span>):</span>
<span># Function body </span> <span>pass</span>
<span>import</span> <span>os</span>
<span>import</span> <span>cProfile</span>
<span>import</span> <span>random</span>

<span>class</span> <span>ConditionalProfiler</span><span>:</span>
    <span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>sample_rate</span><span>=</span><span>0.01</span><span>,</span> <span>profile_dir</span><span>=</span><span>"</span><span>/tmp/profiles</span><span>"</span><span>):</span>
        <span>self</span><span>.</span><span>sample_rate</span> <span>=</span> <span>sample_rate</span>
        <span>self</span><span>.</span><span>profile_dir</span> <span>=</span> <span>profile_dir</span>
        <span>os</span><span>.</span><span>makedirs</span><span>(</span><span>profile_dir</span><span>,</span> <span>exist_ok</span><span>=</span><span>True</span><span>)</span>

    <span>def</span> <span>__call__</span><span>(</span><span>self</span><span>,</span> <span>func</span><span>):</span>
        <span>def</span> <span>wrapped</span><span>(</span><span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>):</span>
            <span># Only profile a small percentage of calls </span>            <span>if</span> <span>random</span><span>.</span><span>random</span><span>()</span> <span><</span> <span>self</span><span>.</span><span>sample_rate</span><span>:</span>
                <span>profile_path</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>self</span><span>.</span><span>profile_dir</span><span>}</span><span>/</span><span>{</span><span>func</span><span>.</span><span>__name__</span><span>}</span><span>_</span><span>{</span><span>os</span><span>.</span><span>getpid</span><span>()</span><span>}</span><span>_</span><span>{</span><span>int</span><span>(</span><span>time</span><span>.</span><span>time</span><span>())</span><span>}</span><span>.prof</span><span>"</span>
                <span>profiler</span> <span>=</span> <span>cProfile</span><span>.</span><span>Profile</span><span>()</span>
                <span>profiler</span><span>.</span><span>enable</span><span>()</span>
                <span>try</span><span>:</span>
                    <span>result</span> <span>=</span> <span>func</span><span>(</span><span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>)</span>
                <span>finally</span><span>:</span>
                    <span>profiler</span><span>.</span><span>disable</span><span>()</span>
                    <span>profiler</span><span>.</span><span>dump_stats</span><span>(</span><span>profile_path</span><span>)</span>
                <span>return</span> <span>result</span>
            <span>else</span><span>:</span>
                <span>return</span> <span>func</span><span>(</span><span>*</span><span>args</span><span>,</span> <span>**</span><span>kwargs</span><span>)</span>
        <span>return</span> <span>wrapped</span>

<span># Usage </span><span>@ConditionalProfiler</span><span>(</span><span>sample_rate</span><span>=</span><span>0.05</span><span>)</span>
<span>def</span> <span>expensive_operation</span><span>(</span><span>data</span><span>):</span>
    <span># Function body </span>    <span>pass</span>
import os import cProfile import random class ConditionalProfiler: def __init__(self, sample_rate=0.01, profile_dir="/tmp/profiles"): self.sample_rate = sample_rate self.profile_dir = profile_dir os.makedirs(profile_dir, exist_ok=True) def __call__(self, func): def wrapped(*args, **kwargs): # Only profile a small percentage of calls if random.random() < self.sample_rate: profile_path = f"{self.profile_dir}/{func.__name__}_{os.getpid()}_{int(time.time())}.prof" profiler = cProfile.Profile() profiler.enable() try: result = func(*args, **kwargs) finally: profiler.disable() profiler.dump_stats(profile_path) return result else: return func(*args, **kwargs) return wrapped # Usage @ConditionalProfiler(sample_rate=0.05) def expensive_operation(data): # Function body pass

Enter fullscreen mode Exit fullscreen mode

This sampling-based approach has served me well in high-throughput production environments. By profiling only a small percentage of requests, we get valuable performance data with minimal overhead.

Continuous Performance Monitoring

Setting up continuous performance monitoring involves integrating these profiling tools into your observability pipeline:

<span>from</span> <span>flask</span> <span>import</span> <span>Flask</span><span>,</span> <span>request</span>
<span>import</span> <span>time</span>
<span>import</span> <span>prometheus_client</span>
<span>from</span> <span>werkzeug.middleware.dispatcher</span> <span>import</span> <span>DispatcherMiddleware</span>
<span>from</span> <span>prometheus_client</span> <span>import</span> <span>make_wsgi_app</span>
<span># Create Flask app </span><span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span>
<span># Setup Prometheus metrics </span><span>REQUEST_TIME</span> <span>=</span> <span>prometheus_client</span><span>.</span><span>Summary</span><span>(</span><span>'</span><span>request_processing_seconds</span><span>'</span><span>,</span>
<span>'</span><span>Time spent processing request</span><span>'</span><span>,</span>
<span>[</span><span>'</span><span>endpoint</span><span>'</span><span>])</span>
<span># Add prometheus wsgi middleware </span><span>app</span><span>.</span><span>wsgi_app</span> <span>=</span> <span>DispatcherMiddleware</span><span>(</span><span>app</span><span>.</span><span>wsgi_app</span><span>,</span> <span>{</span>
<span>'</span><span>/metrics</span><span>'</span><span>:</span> <span>make_wsgi_app</span><span>()</span>
<span>})</span>
<span>@app.route</span><span>(</span><span>'</span><span>/api/data</span><span>'</span><span>)</span>
<span>def</span> <span>get_data</span><span>():</span>
<span>start_time</span> <span>=</span> <span>time</span><span>.</span><span>time</span><span>()</span>
<span># Process request </span> <span>result</span> <span>=</span> <span>{</span><span>"</span><span>data</span><span>"</span><span>:</span> <span>"</span><span>example</span><span>"</span><span>}</span>
<span>REQUEST_TIME</span><span>.</span><span>labels</span><span>(</span><span>endpoint</span><span>=</span><span>'</span><span>/api/data</span><span>'</span><span>).</span><span>observe</span><span>(</span><span>time</span><span>.</span><span>time</span><span>()</span> <span>-</span> <span>start_time</span><span>)</span>
<span>return</span> <span>result</span>
<span>if</span> <span>__name__</span> <span>==</span> <span>'</span><span>__main__</span><span>'</span><span>:</span>
<span>app</span><span>.</span><span>run</span><span>(</span><span>host</span><span>=</span><span>'</span><span>0.0.0.0</span><span>'</span><span>,</span> <span>port</span><span>=</span><span>8000</span><span>)</span>
<span>from</span> <span>flask</span> <span>import</span> <span>Flask</span><span>,</span> <span>request</span>
<span>import</span> <span>time</span>
<span>import</span> <span>prometheus_client</span>
<span>from</span> <span>werkzeug.middleware.dispatcher</span> <span>import</span> <span>DispatcherMiddleware</span>
<span>from</span> <span>prometheus_client</span> <span>import</span> <span>make_wsgi_app</span>

<span># Create Flask app </span><span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span>

<span># Setup Prometheus metrics </span><span>REQUEST_TIME</span> <span>=</span> <span>prometheus_client</span><span>.</span><span>Summary</span><span>(</span><span>'</span><span>request_processing_seconds</span><span>'</span><span>,</span> 
                                         <span>'</span><span>Time spent processing request</span><span>'</span><span>,</span>
                                         <span>[</span><span>'</span><span>endpoint</span><span>'</span><span>])</span>

<span># Add prometheus wsgi middleware </span><span>app</span><span>.</span><span>wsgi_app</span> <span>=</span> <span>DispatcherMiddleware</span><span>(</span><span>app</span><span>.</span><span>wsgi_app</span><span>,</span> <span>{</span>
    <span>'</span><span>/metrics</span><span>'</span><span>:</span> <span>make_wsgi_app</span><span>()</span>
<span>})</span>

<span>@app.route</span><span>(</span><span>'</span><span>/api/data</span><span>'</span><span>)</span>
<span>def</span> <span>get_data</span><span>():</span>
    <span>start_time</span> <span>=</span> <span>time</span><span>.</span><span>time</span><span>()</span>
    <span># Process request </span>    <span>result</span> <span>=</span> <span>{</span><span>"</span><span>data</span><span>"</span><span>:</span> <span>"</span><span>example</span><span>"</span><span>}</span>
    <span>REQUEST_TIME</span><span>.</span><span>labels</span><span>(</span><span>endpoint</span><span>=</span><span>'</span><span>/api/data</span><span>'</span><span>).</span><span>observe</span><span>(</span><span>time</span><span>.</span><span>time</span><span>()</span> <span>-</span> <span>start_time</span><span>)</span>
    <span>return</span> <span>result</span>

<span>if</span> <span>__name__</span> <span>==</span> <span>'</span><span>__main__</span><span>'</span><span>:</span>
    <span>app</span><span>.</span><span>run</span><span>(</span><span>host</span><span>=</span><span>'</span><span>0.0.0.0</span><span>'</span><span>,</span> <span>port</span><span>=</span><span>8000</span><span>)</span>
from flask import Flask, request import time import prometheus_client from werkzeug.middleware.dispatcher import DispatcherMiddleware from prometheus_client import make_wsgi_app # Create Flask app app = Flask(__name__) # Setup Prometheus metrics REQUEST_TIME = prometheus_client.Summary('request_processing_seconds', 'Time spent processing request', ['endpoint']) # Add prometheus wsgi middleware app.wsgi_app = DispatcherMiddleware(app.wsgi_app, { '/metrics': make_wsgi_app() }) @app.route('/api/data') def get_data(): start_time = time.time() # Process request result = {"data": "example"} REQUEST_TIME.labels(endpoint='/api/data').observe(time.time() - start_time) return result if __name__ == '__main__': app.run(host='0.0.0.0', port=8000)

Enter fullscreen mode Exit fullscreen mode

In my experience, the key to effective performance monitoring is collecting the right metrics consistently over time. This makes it possible to detect gradual degradations that might otherwise go unnoticed until they become severe problems.

Integrating Profiling into Development Workflows

Performance should be part of the development process, not just an afterthought:

<span># pytest_profile.py </span><span>import</span> <span>pytest</span>
<span>import</span> <span>cProfile</span>
<span>import</span> <span>pstats</span>
<span>import</span> <span>os</span>
<span>@pytest.fixture</span>
<span>def</span> <span>profile</span><span>(</span><span>request</span><span>):</span>
<span>profiler</span> <span>=</span> <span>cProfile</span><span>.</span><span>Profile</span><span>()</span>
<span>profiler</span><span>.</span><span>enable</span><span>()</span>
<span>yield</span> <span>profiler</span>
<span>profiler</span><span>.</span><span>disable</span><span>()</span>
<span>ps</span> <span>=</span> <span>pstats</span><span>.</span><span>Stats</span><span>(</span><span>profiler</span><span>).</span><span>sort_stats</span><span>(</span><span>'</span><span>cumtime</span><span>'</span><span>)</span>
<span># Create profile output directory </span> <span>os</span><span>.</span><span>makedirs</span><span>(</span><span>'</span><span>profiles</span><span>'</span><span>,</span> <span>exist_ok</span><span>=</span><span>True</span><span>)</span>
<span>test_name</span> <span>=</span> <span>request</span><span>.</span><span>node</span><span>.</span><span>name</span>
<span>ps</span><span>.</span><span>dump_stats</span><span>(</span><span>f</span><span>'</span><span>profiles/</span><span>{</span><span>test_name</span><span>}</span><span>.prof</span><span>'</span><span>)</span>
<span>ps</span><span>.</span><span>print_stats</span><span>(</span><span>10</span><span>)</span>
<span># Usage in test file </span><span>def</span> <span>test_performance_critical_function</span><span>(</span><span>profile</span><span>):</span>
<span># Test code here </span> <span>result</span> <span>=</span> <span>my_function</span><span>()</span>
<span>assert</span> <span>result</span> <span>==</span> <span>expected_value</span>
<span># pytest_profile.py </span><span>import</span> <span>pytest</span>
<span>import</span> <span>cProfile</span>
<span>import</span> <span>pstats</span>
<span>import</span> <span>os</span>

<span>@pytest.fixture</span>
<span>def</span> <span>profile</span><span>(</span><span>request</span><span>):</span>
    <span>profiler</span> <span>=</span> <span>cProfile</span><span>.</span><span>Profile</span><span>()</span>
    <span>profiler</span><span>.</span><span>enable</span><span>()</span>

    <span>yield</span> <span>profiler</span>

    <span>profiler</span><span>.</span><span>disable</span><span>()</span>
    <span>ps</span> <span>=</span> <span>pstats</span><span>.</span><span>Stats</span><span>(</span><span>profiler</span><span>).</span><span>sort_stats</span><span>(</span><span>'</span><span>cumtime</span><span>'</span><span>)</span>

    <span># Create profile output directory </span>    <span>os</span><span>.</span><span>makedirs</span><span>(</span><span>'</span><span>profiles</span><span>'</span><span>,</span> <span>exist_ok</span><span>=</span><span>True</span><span>)</span>
    <span>test_name</span> <span>=</span> <span>request</span><span>.</span><span>node</span><span>.</span><span>name</span>
    <span>ps</span><span>.</span><span>dump_stats</span><span>(</span><span>f</span><span>'</span><span>profiles/</span><span>{</span><span>test_name</span><span>}</span><span>.prof</span><span>'</span><span>)</span>
    <span>ps</span><span>.</span><span>print_stats</span><span>(</span><span>10</span><span>)</span>

<span># Usage in test file </span><span>def</span> <span>test_performance_critical_function</span><span>(</span><span>profile</span><span>):</span>
    <span># Test code here </span>    <span>result</span> <span>=</span> <span>my_function</span><span>()</span>
    <span>assert</span> <span>result</span> <span>==</span> <span>expected_value</span>
# pytest_profile.py import pytest import cProfile import pstats import os @pytest.fixture def profile(request): profiler = cProfile.Profile() profiler.enable() yield profiler profiler.disable() ps = pstats.Stats(profiler).sort_stats('cumtime') # Create profile output directory os.makedirs('profiles', exist_ok=True) test_name = request.node.name ps.dump_stats(f'profiles/{test_name}.prof') ps.print_stats(10) # Usage in test file def test_performance_critical_function(profile): # Test code here result = my_function() assert result == expected_value

Enter fullscreen mode Exit fullscreen mode

This approach integrates performance testing directly into the test suite, making performance regressions visible during regular development cycles.

Best Practices for Performance Monitoring

From my experience implementing these tools across various organizations, I’ve developed some key best practices:

Profile in development with detailed tools like cProfile, but use low-overhead solutions like py-spy in production.

Focus on the “critical path” first – identify the 20% of code that accounts for 80% of execution time.

Establish performance baselines and track changes over time to catch gradual degradations.

Integrate performance metrics with your regular monitoring and alerting system.

Use distributed tracing for microservices architectures to get end-to-end visibility.

Set up automated performance regression testing as part of your CI/CD pipeline.

In production environments, monitor both average and percentile metrics (p95, p99) to catch issues that affect only a subset of users.

The combination of these practices and tools has consistently helped me identify and resolve performance bottlenecks before they impact users. By making performance monitoring a continuous process rather than a one-time optimization effort, you can ensure your Python applications remain responsive and efficient as they evolve.


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

原文链接:10 Essential Python Profiling Tools to Boost Application Performance

© 版权声明
THE END
喜欢就支持一下吧
点赞10 分享
Time and experience heals pain.
时间和经历会抚平一切伤痛
评论 抢沙发

请登录后发表评论

    暂无评论内容