Killer progress bars in Python

This post takes a whistlestop tour tqdm: a fantastic, easy-to-use, extensible progress bar package for Python. It makes adding simple progress bars to Python processes extremely easy. If you’re a software engineer of some experience, chances are you’ll have used or developed algorithms or data transformations that can take a fair while – perhaps many hours or even days – to complete.

It’s not uncommon for software folks to opt to simply print status messages to console, or in some slightly more sophisticated cases use the (excellent and recommended) built-in logging module. In a lot of cases this may well be fine. However, if you’re running a task with many hundreds of steps, or over a data structure with many millions of elements, these approaches are sometimes a little unclear and verbose, and frankly kind of ugly.

Show me the code!

That’s where tqdm can come in. It has a nice clean API that lets you quickly add progress bars to your code. Plus it has a lightweight ‘time-remaining’ estimation algorithm built in to the progress bar too. For the purposes of this post, take a look at the super-minimal example of a mocked-up loop for web scraping using tqdm, below:

import time
from tqdm import tqdm

def get():
    time.sleep(0.25)

with tqdm(total=100) as progress:
    for i in range(100):
        get()
        progress.update(1)

Enter fullscreen mode Exit fullscreen mode

In this simple example, you set up a tqdm progress bar that expects a process of 100 steps (say 100 URLs). Then you can run the loop (with a 0.25 second pause between steps), each time updating the progress bar when the step is completed. You can also update the progress bar by arbitrary amounts if we break out of the loop too. That’s two lines of code (plus the import statement) to get a nice little progress bar in your code:

pandas support

Beyond cool little additions to your program’s outputs, tqdm also integrates nicely with other widely used packages. Take pandas for example, the ubiquitous Python data analysis library. Data Scientists love pandas, but some transformations on data frames can take a fair while. Fortunately, there’s support for automatically adding a tqdm progress bar to calls to the apply method in pandas. Take a look at the example below:

df = pd.read_csv("weather.csv")
tqdm.pandas(desc="Applying Transformation")
df.progress_apply(lambda x: x)

Enter fullscreen mode Exit fullscreen mode

When you run this script, you’ll see something like this:

Technically, the tqdm.pandas method monkey patches the progress_apply method onto pandas data structures, giving them a modified version of the commonly used apply method. Practically, when we call the progress_apply method, the package wraps the standard pandas apply method with a tqdm progress bar. This can come in really handy when you’re processing large data frames!

Parallel processing support

There’s another common application that’s worth mentioning here too: tqdm is great for setting up progress bars for parallel processes too. Here is an example using some of tqdm‘s built in support for updating a progress bar for a parallel map:

import time
from tqdm.contrib.concurrent import process_map

def my_process(_):
   time.sleep(0.25)

r = process_map(my_process, range(0, 100), max_workers=2, desc="MyProcess")

Enter fullscreen mode Exit fullscreen mode

In this case, you’ll have a single progress bar that gets updated each time a my_process call finishes. There’s a second use case though: how about if you’ve got a few long-running processes and you want to track these individually? This might be preferable if you want to avoid serialising and de-serialising large objects into and out of processes, for example. You can do that too:

import time
import multiprocessing as mp
from tqdm import tqdm

def my_process(pos):
    _process = mp.current_process()
    with tqdm(desc=f"Process {pos}", total=100, position=pos) as progress:
        for _ in range(100):
            time.sleep(0.1)
            progress.update(1)

n_cpu = mp.cpu_count(
with mp.Pool(processes=n_cpu, initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as pool:
    pool.map(my_process, range(n_cpu))

Enter fullscreen mode Exit fullscreen mode

This should give you an output something along the lines of:

There’s a Gist of this example you can use too.

Jupyter support

The last integration I’ll be touching on in this post is the built-in support for using tqdm in a Jupyter Notebook. To do this, you’ll need to make sure you’ve installed Jupyter, as well ipywidgets. You’ll then need to run:

jupyter nbextension enable --py widgetsnbextension

Enter fullscreen mode Exit fullscreen mode

To enable extensions. With this set up, in a cell in a new notebook, you should be able to run the example from earlier:

from tqdm.notebook import tqdm

arr = list(range(100))

with tqdm(desc="My Progress bar", total=len(arr)) as progress:
    for element in arr:
        progress.update(1)

Enter fullscreen mode Exit fullscreen mode

And see something similar to this:

Cool, right?

Further reading

Interested in finding out more about tqdm? Here’s their GitHub.

The cover image for this post was taken from a TED talk on progress bars. It’s worth checking out.

原文链接:Killer progress bars in Python

© 版权声明
THE END
喜欢就支持一下吧
点赞7 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容