Jupyterlab, Python3, asyncio – asynchronous tasks in a notebook background thread

Jupyterlab and IPython are always good for some surprises. Things that work in a standard Python task in Eclipse or at the prompt of a Linux shell may not work in a Python notebook within a Jupyterlab environment. One example where things behave a bit differently in Jupyterlab are asynchronous tasks.

This post is about starting and stopping asynchronous tasks via the Python3 package “asyncio” in a Jupyterlab notebook. In addition we do not want to block the usage of further notebook cells despite long or infinitely running asyncio-tasks.

To achieve this we have to use nested asyncio-loops and to start them as background jobs. In addition we also want to stop such loops from the foreground, i.e. from other cells in the notebook.

Being able to do such things is helpful in many Machine-Learning contexts. E.g. when you want to move multiple and concurrent training tasks as well as evaluation tasks into the background. It may also be helpful to control the update of separately started Qt5- or Gtk3/Gtk4-windows on the Linux desktop with new data on your own.

Level of this post: Advanced. You should have some experience with Jupyterlab, the packages asyncio and Ipython.lib.backgroundjobs.

Warnings and clarifications

Experimenting with asyncio and threads requires some knowledge. One reason to be careful: The asyncio-package has changed rapidly with Python3 versions. You have to test thoroughly what works in your virtual Python3 environment and what does not or no longer work.

1) Asynchronous jobs are not threads

Just to avoid confusion: When you start asynchronous tasks via asyncio no new Python threads are opened. Instead asyncio tasks are functions which run concurrently, but under the control of one and the same loop (in one and the same Python thread, most often the main thread). Concurrency is something different than threads or multiprocessing. It is an efficient way to intermittently distribute work between jobs of which at least one has to wait for events. I recommend to spend some minutes and read the nice introduction into asyncio given here by Brad Solomon.

2) Warning: There is already a loop running in a Jupyterlab Python notebook

Those of you who have already tried to work with asyncio-jobs in Jupyterlab notebooks may have come across unexpected errors. My own experience was that some of such errors are probably due to the fact that the notebook itself has an asyncio-loop running, already. The command asyncio.get_event_loop() will point to this basic control loop. As a consequence new tasks started via asyncio.get_event_loop().run_until_complete(task) will lead to an error. And any job which tries to stop the running notebook loop to end additionally assigned tasks [via get_event_loop().create_task(function())] will in the end crush the notebook’s kernel.

3) Warning: Asynchronous tasks are blocking the notebook cell from which they are started

There is a consequence of 1 and 2: Adding a new task to the running loop of the Ipython notebook via
asyncio.get_event_loop().create_task(your_function)
has a cell blocking effect. I.e. you have to wait until your asynchronous task has finished before you can use other notebook cells (than the one you used to start your task). So, please, do not start infinitely running asyncio tasks before you know you have complete control.

4) Consequences

We need a nesting of asyncio.loops. I.e. we need a method to start our own loops within the control of the notebook’s main loop. And: We must transfer our new loop and assigned tasks into a background thread. In the following example I will therefore demonstrate four things:

Define the start of a new and nested asyncio-loop to avoid conflicts with the running loop of the notebook.
Putting all predefined asyncronous actions into a background thread.
Stopping a running asyncio-loop in the background thread
Cancelling asyncio-tasks in the background thread

Example – main code cells and explanations

The following code example illustrates the basic steps listed above. It can also be used as a basis for your own experiments.

I ran the code in Jupyterlab 4.0.8, with Python version 3.9.6, Ipython 8.5.0, notebook 7.0.6 and other packages, which all were updated to their present versions (of 11/22/23).

Cell 1 – Imports

import os
import time
import asyncio
import nest_asyncio
import matplotlib.backends
import matplotlib.pyplot as plt
from IPython.lib import backgroundjobs as bg

The only thing which may surprise you is the package “nest_asyncio“. It is required to work with nested asyncio-loops. We need it in particular to become able to stop new asyncio-loops which have been started under the control of the notebook’s main loop.

Cell 2 – Activate nested asyncio

nest_asyncio.apply()

This is a super-important statement! Do not forget it! Otherwise you will not get full control.

Cell 3 – Functions for asynchronous tasks

async def sprinter(num_sprinter=200, b_print=False):
    if b_print: 
        print("sprinter: num_sprinter: ", num_sprinter)
    i=0
    print('sprinter :', i)
    while i < num_sprinter:
        i += 1
        if i%20 == 0:
            print('sprinter :', i)
        await asyncio.sleep(0.1)
    print("sprint finished: ", i)

async def stopper(stop_event, num_stopper=21, b_print=False):
    if b_print:
        print("stopper: num_stopper: ", num_stopper)
    for i in range(num_stopper):
        if i%20 == 0: 
            print("stopper : ", i)
        if stop_event.is_set():
            break
        await asyncio.sleep(0.1)
    stop_event.set()
    print("finished")
    await asyncio.sleep(0.01)

There are two functions "sprinter()" and "stopper(stop_event)". They are rather simple; both do some printing and intermittent sleeping, only. Note that both functions are defined with the keyword "async". This is required because these function shall later be run asynchronous under the control of an asyncio-loop.

sprinter() is just a long running job. To give you a real world example: It could be a job which redraws the canvas of an external plot with high frequency to adapt the plot figure to new data (e.g. update a Gtk-window on a KDE desktop periodically).

The function stopper() is more interesting. We will use it to stop a controlling asyncio-loop a bit later. It gets an asyncio-"Event"-object as one of its arguments. This event is triggered internally at the end of stopper's internal for-loop. But we check if it has been set some other way. Triggering the event will automatically lead to a condition for ending stopper's own asyncio-loop.

Cell 4 - a job to set up a new asyncio event-loop

def run_loop(num_sprinter=400, num_stopper=21, b_print=True):
    if b_print: 
        print("run_loop: num_sprinter: ", num_sprinter)
        print("run_loop: num_stopper: ", num_stopper)
        print()

    async_loop = asyncio.new_event_loop()
    run_loop.loop = async_loop
    asyncio.set_event_loop(async_loop)

    run_loop.task1 = async_loop.create_task(sprinter(num_sprinter, b_print))
    stop_event = asyncio.Event()
    run_loop.stopx = stop_event
    run_loop.task2 = async_loop.create_task(stopper(stop_event, num_stopper, b_print))
    async_loop.run_until_complete(run_loop.task2)

This function does not need the async-keyword. It will not become a asyncio-task. Instead it creates our own new asyncio-loop by asyncio.new_event_loop() and assigns tasks to this loop.

After picking up some external parameters we set up a new asyncio-loop by asyncio.new_event_loop() and get a reference to this loop which we name "async_loop".

As we later want to access and stop the loop from outside the function run_loop() we create and set a new attribute of the function. (Remember, a function in Python is an object; see [9]). This attribute will only be accessible from outside after the function has once be called and established as an object. This is no major problem in our context. But it may take a little time when we call the function in a new Python thread; see below.

After the new asyncio-loop has been defined, we set it as the current loop within the present thread-context [3]. This thread well be a new one aside the notebook-thread; see below.

Then we set up a first task - sprinter() - under the control of our "async_loop" via
async_loop.create_task( sprinter(num_sprinter, b_print) ).
Afterward we define an asyncio-Event-object which we supply to a second task based on stopper().

Note that so far these are all just definitions. Our loop "async_loop" is not yet running and our tasks have not yet been started.

We start our new loop via async_loop.run_until_complete( run_loop.task2 ). Only afterward the two tasks run and do their jobs. This would be very different if we had used asyncio.get_event_loop().create_task(). Had we done that we would have started a task directly in the asyncio-loop of the notebook!

Note that with starting the event loop with run_until_complete() we defined a condition for the loop's existence:
The loop will stop in a natural way as soon as task2 is finished.

So even if task1, i.e. sprinter(), had run infinitely, it would be stopped as soon as stopper() finishes. stopper() is a kind of emergency tool - we will later extend task1 significantly. stopper() always offers us a clean way to stop the whole asyncio-loop correctly. See also [4], [5].

Cell 5 - Starting a background job

# Numbers of internal iterations of 
# sprinter() and stopper()
num_sprinter = 400  
num_stopper = 41     

a = run_loop
jobs = bg.BackgroundJobManager()
out = jobs.new(a, num_sprinter, num_stopper)
print()
print(out)
print()

We set the number of loop-iterations for our two tasks first. Note that the number for stopper() is chosen to be much smaller than that for sprinter(). As we have set the same timing for their sleeping interval asyncio.sleep(0.1) for task1 and task2, task2 should finish long before task1 ends. So, in a first test we should see that the asyncio-loop stops after 40 internal iterations of stopper().

In the second part of the cell we set a callback "a = run_loop". Then we set up an object "jobs" which allows us to control background-jobs from an IPython environment like Jupyterlab. bg refers to a related library (see cell 1 and [6]; in particular the section on classes and "lib.backgroundjobs").

We use the jobs.new() function to create a new Python thread and start the function run_loop() within this thread. "run_loop" in turn leads to the creation of the aspired asyncio-loop in the new thread. (Note that we could have started more loops in this thread.) The positional arguments to the callback "a" are provided directly and comma-separated.

The output problem of background jobs of Jupyterlab

The output of the backgroundjob "run_loop()" will be directed to any cell which we presently use in the notebook. As the tasks are running as a background-job in another thread than the main notebook thread, we (hopefully) can work with other notebook cells after having started the job. But the output area of those cells will potentially be cluttered by messages from the background job and its asyncio-tasks. This can become very confusing.

A very simple solution on Linux would be to write the output of the background tasks into some file whose changing contents we follow by "tail -f -n100 your_filepath". Another solution, which is based on the creation of a separate HTML-window, is outlined in [10] (for a Pandas output, but we can adapt the code to our needs).

For those who like PyQt5 the probably best solution is to open a separate and original Qt5-window containing a QTextEdit-widget with the help PyQt5. You can write to such a window thread-safe via a queue. I will demonstrate this in a forthcoming post.

First test

Let us try all this out and run all of the cells defined so far. The output of cell 5 indeed looks like:

run_loop: num_sprinter:  400
run_loop: num_stopper:  41

sprinter: num_sprinter:  400
sprinter : 0
stopper: num_stopper:  41
stopper :  0

sprinter : 20
stopper :  20
sprinter : 40
stopper :  40
finished

Exactly what we had hoped for! 🙂

Other helping cells to control the status of the background jobs

Cell 6 - Checking the status of the background job

jobs.status()

After we have run up to cell 5 with the iteration numbers given above and have awaited finalization of the syncio-loop this statement produces something like

Completed jobs:
0 :

Cell 9 - Removing a finalized job from the BackgroundJobManager() control object

When we see something like "Completed jobs" or "Dead jobs" we can use the shown number (in the above case 0) to remove the job from the list of jobs controlled by our object "jobs=bg.BackgroundJobManager()".

jobs.status()
jobs.remove(0)
jobs.status()

Afterward, we should not get any output from the last jobs.status() (if we had not started other jobs in further threads).

How to stop the asyncio-loop in the background thread

We have multiple options to do so. Let us assume that we have set num_sprinter=5000 and num_stopper=2000. This would give us enough time to move to other cells of the notebook and test stopping options there. We prepare the following cell:

Cell 8 - stopping the asyncio-loop in the background

b_stop = 0 
if b_stop == 0:
    a.stopx.set()
elif b_stop == 1: 
    a.loop.stop()
else: 
    a.task2.cancel()
    a.task1.cancel()

This cell allows for 3 methods, which you can test independently test by the following steps:

check that jobs.status() has no output (cell 6) => change b_stop (cell 9) => define run_loop again (cell 4) => restart run_loop (cell 5) => wait (!!) for two outputs from both tasks => stop run_loop via cell 8 => check the status of jobs (cell 6) => remove dead job from list

Waiting is required as the jobs have to start and because asyncio.sleep(0.1) must have run at least once on both tasks. Otherwise you will get errors referring to pending tasks. But even if this happens the loop will be dead nevertheless. Our options are:

b_stop = 0: A very clean method which uses our condition for the loop, which should only run until task2 has finished. Can we trigger a stop of this task, before its internal for-loop finalizes? Yes, we can.
We have prepared a property "stopx" of our function "run_loop()", which directly points to the asyncio-Event-object used in task2 (stopper). We can trigger the event by its set()-method and we can do this from any cell in our Jupyterlab notebook. stopper() in turn checks the status of the event within its for-loop and breaks the loop when the event happens. Then stopper() finalizes and the asyncio-loop is stopped and removed.

b_stop = 1: This method directly uses the function's attribute "run_loop.loop", which points to the asyncio-loop, to stop the loop directly. (Note that this method in our case requires to wait for some output.)

b_stop = 2: This method directly uses the function's attributes "run_loop.task1" and "run_loop.task2", which point to the asyncio-loop's tasks. Via a.task2.cancel() and a.task1.cancel() the tasks can be removed. The asyncio-loop stops automatically afterward. (Note that this method in our case requires to wait for some output.)

Restarting the job "run_loop" after having it removed

If you want to be on the safe side redefine "run_loop()" again by using cell 4.

Can we start multiple background jobs - each with its own asyncio loops and tasks?

Yes, we can. You can try this out by preparing two functions with different names "run_loop1" and "run_loop2". You, of course, have to adapt other statements, too.

But note: It is not wise to start our job "run_loop" just twice by running cell 5 two times. The reason is that we get an overlap of function names then which may potentially lead to unclear attribute assignments across the functions.

Can we kill a thread (started by Ipython's backgroundjobs) that runs wild?

Unfortunately, we cannot kill a thread, which we have started from a cell of a Jupyterlab notebook, by some clever commands of the backgrounds-package in another cell of the same Jupyter notebook. We have to wait until the thread terminates regularly.

However and of course, we could eliminate such a process from a Linux terminal or by issuing the Linux kill-command via the OS-package from a Jupyterlab cell. We first have to find the process PID for the notebook kernel. (It would probably the last kernel-process if you had started the notebook as the last one in Jupyterlab.) "ps aux | grep kernel" will help. Afterward you can show running threads of this process via "ps -T -p <pid>". The SPID given there can be used to kill such a process with the "kill -9 <(S)PID>" command.

Further experiments?

You could experiment with tasks that use internal oops with different task-dependent intervals of "await asyncio.sleep(interval)". Also think about realistic examples where one task produces data which the other task uses to evaluate and to plot or print.

Conclusion

With this post I have shown that we can start and stop asyncio-tasks in the background of a Jupyterlab Python notebook. Important ingredients were the use of nest_asyncio.apply() and the creation of a new asyncio-loop in the background of the notebook.

This appears to be useful in many situations. One example would be concurrent jobs of which at least one waits for some events or data from other tasks. This waiting-time can be used by other functions while intermittently event-conditions are controlled by task internal loops with sleep-intervals. Another example could be a job that creates information while another job plots this information in parallel.

In my simple example presented above the tasks had some relatively long intermittent await-intervals of 0.1 sec. In real world examples, in particular when plot-updates are involved, we would probably use significantly shorter intervals. In addition we would use different wait-intervals per task (fitting the production times of each of the concurrent tasks).

A natural advantage of running in the background, i.e. in another thread, is that we do not block working with other notebook cells. We can use code in the foreground while concurrent jobs meanwhile do their jobs in the background. We have to care of separating the outputs of background jobs from the outputs of the concurrently used notebook cells. But this is no principle problem.

In a forthcoming post in this blog I will use the results above for controlling GTK3- windows which presently do not cooperate correctly with Matplotlib's interactive mode (ion()) in Jupyterlab. We will allow for canvas redraw-actions while at the same time awaiting a window close event.