Skip to content

threads

Using PyQt with QtAgg in Jupyterlab – IV – simple PyQt and MPL application with background worker and receiver threads

As you read this post you are probably interested in Machine Learning [ML] and hopefully in Linux systems as a ML-platform as well. This post series wants to guide you over a bridge between the standard tool-set of Python3 notebooks in Jupyterlab for the control of ML-algorithms and graphical Qt-applications on your Linux desktop. The objective is to become more independent of some limitations of the browser based Jupyterlab notebooks.

One aspect is the use of graphical Qt-based control elements (as e.g. buttons, etc.) in desktop windows. On the other hand we want to use background threads to produce (ML) data which we later, e.g. during training runs, display in Qt windows. Background threads will also enable us to run smaller code in other cells of our notebook during long ML-runs. We are also confident that we can keep up the interactivity of both our Qt windows and Jupyterlab during such runs.

We will later use the callback machinery of Keras based ML-runs to produce ML-data and other information about a running ML-algorithm in the background of Jupyterlab. These data will be sent to Matplotlib- and Qt callback-functions in Jupyterlab which then update Qt windows.

Knowledge gained so far …

During the previous posts we have gathered enough information to now build an example PyQt application, which utilizes two background threads.

We have seen that QtAgg, a backend bridge for producing Matplotlib [MPL] plots in Qt windows, can be used for full fledged PyQt applications, too. In the first post we became familiar with some useful Qt-widgets and the general structure of Qt-Apps.

In the 2nd and 3rd posts we have learned that both Matplotlib figures and Qt-widgets must be controlled by the main thread associated with our Jupyterlab notebook. A Qt event loop is started in this thread by QtAgg for us. We have also noted that background threads controlled by QThread-objects can send signals which end up serialized in the Qt event queue of the main thread. From there they can be handled asynchronously, but in timely order by callbacks, which in turn update Qt-widgets for MPL-plots and other information. The 3rd post discussed a general pattern to employ both a raw data producing worker thread and a receiver thread to prepare the data for eventual foreground handling.

Objective of this post

In this post I will discuss a simple application that produces data with the help of two background threads according to the pattern discussed in the previous post. All data and information will periodically be sent from the background to callbacks in the main thread. Although we only use one main Qt window the structure of the application includes all key elements to serve as a blueprint for more complex situations. We will in particular discuss how to stop the background jobs and their threads in a regular way. An interesting side topic will be how one captures print output to stdout from background jobs.

Level of this post: Advanced. Some experience with Jupyterlab, QtAgg, Matplotlib and (asynchronous) PyQt is required. The first three posts of this series provide (in my opinion) a quick, though steep learning curve for PyQt newbies.

Application elements

Our PyQt application will contain three major elements in a vertical layout:

  • Two buttons to start and stop two background threads. These threads provide data for a sine-curve with steadily growing frequency and some related information text.
  • A Qt-widget for a Matplotlib figure to display the changing sine curve.
  • A series of QTextEdit widgets to display messages from the background and from callbacks in the foreground.

Our pattern requires the following threads: A “worker thread” periodically creates raw data and puts them into Python queues. A “receiver thread” reads out the queues and refines the data.

In our case the receiver thread will add additional information and data. Then signals are used to communicate with callbacks in the main thread. We send all data for widget and figure updates directly with the signals. This is done for demonstration purposes. We could also have used supplemental data queues for the purpose of inter-thread data exchange. For plotting we use Matplotlib and the related Figure.canvas-widget provided by QtAgg.

So, we have a main thread with a Qt event loop (and of course a loop for Jupyterlab REPL interaction) and two background threads which perform some (simple) asynchronous data production for us.

Our challenge: Qt and Matplotlib control with Python code in a Jupyterlab notebook

The application looks pretty simple. And its structure will indeed be simple. However, as always the devil is an expert for faults in details. In our particular situation with Jupyterlab we need to get control over the following tasks:

  • setup and start of two background threads – a worker thread and a receiver thread,
  • association of worker and receiver objects to the named threads with a respective affinity,
  • asynchronous inter-thread communication and data exchange via signals,
  • updates of Qt-widgets and integrated Matplotlib figures,
  • spinning the Qt-event-loop in the main thread to ensure quick widget updates,
  • a regular stop of thread activities and a removal of thread-related objects,
  • checking interactivity of both the Jupyterlab and the Qt-interface,
  • stability of the plot-production against potentially conflicting commands from the main thread.

All via code executed in cells of a Python notebook. An additional topic is:

  • capturing print-commands in the background and transmission of the text to the foreground.
Read More »Using PyQt with QtAgg in Jupyterlab – IV – simple PyQt and MPL application with background worker and receiver threads

Using PyQt with QtAgg in Jupyterlab – III – a simple pattern for background threads

We can use PyQt to organize output of Machine Learning applications in Qt-windows outside of Jupyterlab notebooks on a Linux desktop. PyQt also provides us with an option to put long running Python code as ML training and evaluation runs into the background of Jupyterlab and redirect graphical and text output to elements of Qt windows. Moving long lasting Python jobs and ML algorithms to the background of Jupyterlab would have the advantages

  • that we could run short code segments in other notebook cells in the meantime
  • and keep up the responsiveness of PyQt and Qt-based Matplotlib windows on the desktop.

In the first two posts of this series

we saw that PyQt and its GUI-widgets work perfectly together with Matplotlib’s backend QtAgg. Matplotlib figures are actually handled as special Qt widgets by QtAgg. We also gathered some information on threads in relation to Python and (Py)Qt. We understood that all (Py)Qt-GUI-classes and widgets must be run in the main thread of Jupyterlab and that neither Qt-widgets nor Matplotlib functions are thread-safe.

As a consequence we need some thread-safe, serializing communication method between background threads and the main thread. Qt-signals are well suited for this purpose as they end up in the event queue of target threads with fitting slots and respective functions. The event queue and the related event loop in the main thread of a Qt application enforce the required serialization for our widgets and Matplotlib figures.

In this post I want to discuss a simple pattern of how to put workload for data production and refinement into the background and how to trigger the updates of graphical PyQt windows from there. The pattern is based on elements discussed in the 2nd post of this series.

Pattern for the interaction of background threads with Qt objects and widgets in the foreground

You may have read about various thread-related patterns as the producer/consumer pattern or the sender/receiver pattern.

It might appear that the main thread of a Jupyter notebook with an integrated main Qt event loop would be a natural direct consumer or receiver of data produced in the background for graphical updates. One could therefore be tempted to think of a private queue as an instrument of serialization which is read out periodically from an object in the main thread.

However, what we cannot do is to run a loop with a time.sleep(interval)-function in a notebook cell in the main thread for periodic queue handling. The reason is that we do not want to block other code cells or the main event loop in our Python notebook. While it is true that time.sleep() suspends a thread, so another thread can run (under the control of the GIL), the problem remains that within the original thread other code execution is blocked. (Actually, we could circumvent this problem by utilizing asyncio in a Jupyterlab notebook. But this is yet another pattern for parallelization. We will look at it in another post series.)

Now we have two options:

  1. We may instead use the particular queue which is already handled asynchronously in Jupyterlab – namely the event queue started by QtAgg. We know already that signals from secondary (QThread-based) threads are transformed into Qt-events. We can send relevant data together with such signals (events) from the background. They are placed in the main Qt event queue and dispatched by the main event loop to callbacks.
  2. If we instead like to use a private queue for data exchange between a background and the main thread we would still use signals and respective slot functions in the main thread. We access our queue via a slot’s callback and read-out only one or a few new entries from there and work with them.

I will use the second option for the exchange of larger data objects in another post in this series. The pattern discussed in this post will be build upon the first option. We will nevertheless employ our own queue for data exchange – but this time between two threads in the background.

Short running callbacks in the main thread

According to what we learned in the last post, we must take care of the following:

The code of a callback (as well as of event handlers) in the main thread should be very limited in time and execute as fast as possible to create GUI updates.

Otherwise we would block the execution of main event loop by our callback! And that would render other graphical objects on the desktop or in the notebook unresponsive. In addition it would also block running code in other cells.

This is really an important point: The integration of Qt with Jupyterlab via a hook for handling the the Qt main event loop seemingly in parallel to IPython kernel’s prompt loop is an important feature which guarantees responsiveness and which we do not want to spoil by our background-foreground-interaction.

This means that we should follow some rules to keep up responsiveness of Jupyterlab and QT-windows in the foreground, i.e. in the main thread of Jupyterlab:

  • All data which we want to display graphically in QT windows should already have been optimally prepared for plotting before the slot function uses them for QT widget or Matplotlib figure updates.
  • Slot functions (event handlers) should use the function Qwidget.Qapplication.process_events()
    to intermittently spin the event-loop for the update of widgets.
  • The updates of PyQt widgets should only periodically be triggered via signals from the background. The signals can carry the prepared data with them. (If we nevertheless use a private queue then the callback in the main thread should only perform one queue-access via get() per received signal.)
  • The period by which signals are emitted should be relatively big compared to the event-loop timing and the typical processing of other events.
  • We should separate raw data production in the background from periodic signal creation and the related data transfer.
  • Data production in the background should be organized along relatively small batches if huge amounts of data are to be processed.
  • We should try to circumvent parallelization limitations due to the GIL whenever possible by using C/C++-based modules.

In the end it is all about getting data and timing right. Fortunately, the amount of data which we produce during ML training runs, and which we want to display on some foreground window, is relatively small (per training epoch).

A simple pattern for background jobs and intermediate PyQt application updates

An object or function in a “worker thread” calculates and provides raw data with a certain production rate. These data are put in a queue. An object or function in a “receiver thread” periodically reads out the next entries in the queue. The receiver knows what to do with these data for plotting and presentation. It handles them, modifies them if necessary and creates signals (including some update data for PyQt widgets). It forwards these signals to a (graphical) application in the main foreground thread. There they end up as events in the Qt event queue. Qt handles respective (signal-) events by so called “slots“, i.e. by callbacks for the original signals. The PyQt- application there has a graphical Qt-window that visualizes (some of) the data.

Read More »Using PyQt with QtAgg in Jupyterlab – III – a simple pattern for background threads

Using PyQt with QtAgg in Jupyterlab – II – excursion on threads, signals and events

In the first post of this series on PyQt

Using PyQt with QtAgg in Jupyterlab – I – a first simple example

we have studied how to set up a PyQt application in a Jupyterlab notebook. The key to getting a seamless integration was to invoke the QtAgg-backend of Matplotlib. Otherwise we did not need to use any of Matplolib’s functionality. For our first PyQt test application we just used multiple nested Qt-widgets in a QMainWindow to create a simple, but interactive and instructive application in a Qt-window on the desktop.

So, PyQt works well with QtAgg and IPython. We just construct and show a QMainWindow; we need no explicit exec() command. An advantage of using PyQt is that we get moveable, resizable windows on our Linux desktop, outside the browser-bound Jupyterlab environment. Furthermore, PyQt offers a lot of widgets to build a full fledged graphical application interface with Python code.

But our funny PyQt example application still blocked the execution of code in other notebook cells! It just demonstrated why we need background threads when working with Jupyterlab and long running code segments. This would in particular be helpful when working with Machine Learning [ML] algorithms. Would it not be nice to put a task like the training of an ML algorithm into the background? And to redirect the intermediate output after training epochs into a textbox in a desktop window? While we work with other code in the same notebook?

The utilization of background threads is one of the main objectives of this post series. In the end we want to see a PyQt application (also hosting e.g. a Matplotlib figure canvas) which displays data that we created in a ML background job. All controlled from a Jupyterlab Python notebook.

To achieve this goal we need a general strategy how to split work between foreground and background tasks. The left graphics indicates in which direction we will move.

But first we need a toolbox and an overview over possible restrictions regarding Python threads and PyQt GUI widgets. In this post we, therefore, will look at relevant topics like concurrency limitations due to the Python GIL, thread support in Qt, Qt’s approach to inter-thread communication, signals and once again Qt’s main event loop. I will also discuss some obstacles which we have to overcome. All of this will give us sufficient knowledge to understand a concrete pattern for a workload distribution which I will present in the next post.

Level of this post: Advanced. Some general experience with asynchronous programming is helpful. But also beginners have a chance. For a ML-project I myself had to learn on rather short terms how one can handle thread interaction with Qt. In the last section of this post you will find links to comprehensive and helpful articles on the Internet. Regarding signals I especially recommend [1.1]. Regarding threads and the helpful effects of event loops for inter-thread communication I recommend to read [3.1] and [3.2] (!) in particular. Regarding the difference between signals and events I found the discussions in [2.1] helpful.

Read More »Using PyQt with QtAgg in Jupyterlab – II – excursion on threads, signals and events

Jupyterlab, Python3, asyncio – asynchronous tasks in a notebook background thread

Jupyterlab and IPython are always good for some surprises. Things that work in a standard Python task in Eclipse or at the prompt of a Linux shell may not work in a Python notebook within a Jupyterlab environment. One example where things behave a bit differently in Jupyterlab are asynchronous tasks.

This post is about starting and stopping asynchronous tasks via the Python3 package “asyncio” in a Jupyterlab notebook. In addition we do not want to block the usage of further notebook cells despite long or infinitely running asyncio-tasks.

To achieve this we have to use nested asyncio-loops and to start them as background jobs. In addition we also want to stop such loops from the foreground, i.e. from other cells in the notebook.

Being able to do such things is helpful in many Machine-Learning contexts. E.g. when you want to move multiple and concurrent training tasks as well as evaluation tasks into the background. It may also be helpful to control the update of separately started Qt5- or Gtk3/Gtk4-windows on the Linux desktop with new data on your own.

Level of this post: Advanced. You should have some experience with Jupyterlab, the packages asyncio and Ipython.lib.backgroundjobs.

Warnings and clarifications

Experimenting with asyncio and threads requires some knowledge. One reason to be careful: The asyncio-package has changed rapidly with Python3 versions. You have to test thoroughly what works in your virtual Python3 environment and what does not or no longer work.

1) Asynchronous jobs are not threads

Just to avoid confusion: When you start asynchronous tasks via asyncio no new Python threads are opened. Instead asyncio tasks are functions which run concurrently, but under the control of one and the same loop (in one and the same Python thread, most often the main thread). Concurrency is something different than threads or multiprocessing. It is an efficient way to intermittently distribute work between jobs of which at least one has to wait for events. I recommend to spend some minutes and read the nice introduction into asyncio given here by Brad Solomon.

2) Warning: There is already a loop running in a Jupyterlab Python notebook

Those of you who have already tried to work with asyncio-jobs in Jupyterlab notebooks may have come across unexpected errors. My own experience was that some of such errors are probably due to the fact that the notebook itself has an asyncio-loop running, already. The command asyncio.get_event_loop() will point to this basic control loop. As a consequence new tasks started via asyncio.get_event_loop().run_until_complete(task) will lead to an error. And any job which tries to stop the running notebook loop to end additionally assigned tasks [via get_event_loop().create_task(function())] will in the end crush the notebook’s kernel.

3) Warning: Asynchronous tasks are blocking the notebook cell from which they are started

There is a consequence of 1 and 2: Adding a new task to the running loop of the Ipython notebook via
asyncio.get_event_loop().create_task(your_function)
has a cell blocking effect. I.e. you have to wait until your asynchronous task has finished before you can use other notebook cells (than the one you used to start your task). So, please, do not start infinitely running asyncio tasks before you know you have complete control.

4) Consequences

We need a nesting of asyncio.loops. I.e. we need a method to start our own loops within the control of the notebook’s main loop. And: We must transfer our new loop and assigned tasks into a background thread. In the following example I will therefore demonstrate four things:

  1. Define the start of a new and nested asyncio-loop to avoid conflicts with the running loop of the notebook.
  2. Putting all predefined asyncronous actions into a background thread.
  3. Stopping a running asyncio-loop in the background thread
  4. Cancelling asyncio-tasks in the background thread

Example – main code cells and explanations

The following code example illustrates the basic steps listed above. It can also be used as a basis for your own experiments.

Read More »Jupyterlab, Python3, asyncio – asynchronous tasks in a notebook background thread