Skip to content

Machine Learning

Installation of CUDA 12.3 and CuDNN 8.9 on Opensuse Leap 15.5 for Machine Learning

Machine Learning on a Linux system is no fun without a GPU and its parallel processing capabilities. On a system with a Nvidia card you need basic Nvidia drivers and additional libraries for optimal support of Deep Neural Networks and Linear Algebra operations on the GPU sub-processors. E.g., Keras and Tensorflow 2 [TF2] use CUDA and cuDNN-libraries on your Nvidia GPU. Basic information can be found here:

This means that you must not only perform an installation of (proprietary) Nvidia drivers, but also of CUDA and cuDNN on your Linux system. As I have started to work with ResNet-110v2 and ResNet-164v2 variants lately I was interested whether I could get a combination of

  • TF 2.15 with Keras
  • the latest of Nvidia GPU drivers 545.29.06 – but see the Addendum at the end of the post and the warnings therein.
  • the latest CUDA-toolkit version 12.3
  • and cuDNN version 8.9.7

to work on an Opensuse Leap 15.5 system. This experiment ended successfully, although the present compatibility matrices on the Nvidia web pages do not yet include the named combination. While system wide installations of the CUDA-toolkit and cuDNN are no major problems, some additional settings of environment variables are required to make the libraries available for Python notebooks in Jupyterlab or classic Jupyter Notebooks (i.e. IPython based environments). These settings are not self-evident.

This post summarizes the most important steps of a standard system-wide installation of CUDA and cuDNN on an Opensuse Leap 15.5 system. I do not install TensorRT in this post. As long as you do not work with (pre-trained) LLMs you do not really need TensorRT.

Level of his post: Active ML user – advanced. You should know how RPM and tar-based installations work on a Leap system. You should also have a working Python3 installation (in a virtual environment) and a Jupyter Notebook or (better) a Jupyterlab-installation on your system to be able to perform ML-tests based on Keras. I do not discuss a Jupyter and Python installation in this post.

Limitations and requirements

GPU capabilities: You need a fairly new Nvidia graphics card to make optimal use of the latest CUDA features. In my case I tested with a Nvidia 4060 TI. Normally the drivers and libraries should detect the capabilities of older cards and adapt to them. But I have not tested with older graphics cards.

Disk space: CUDA and cuDNN require a substantial amount of disk space (almost 7 GiB) when you install the full CUDA-toolkit as it is recommended by NVIDIA.

Remark regarding warnings: Installing CUDA 12.3 and using it with Tensorflow 2.15 will presently (Jan. 2024) lead to warnings in your Python 3 notebooks. However, in my experience these warnings have no impact on the performance. My 4060 TI did its job in test calculations with convolutional Autoencoders and ResNets as expected. Regarding ResNets even 5% faster than with CUDA 11.2.

Alternative installation methods: You may find information about a pure Python based installations including CUDA via pip. See e.g. here: https://blog.tensorflow.org/2023/11/whats-new-in-tensorflow-2-15.html. While this potentially makes local user-specific installations easier, the disadvantage for multiple virtual Python environments is the resulting consumption of disk space. So, I still prefer a system wide installation. It also seems to be such that one should not mix both ways of installation – system-wide and virtual-environment specific. I have e.g. tried to install TensorRT via pip after a systemwide standard CUDA installation. The latter itself had worked. But after the additional TensorRT installation with pip my GPU could no longer used by Keras/TF2 based ML code started from Jupyterlab notebooks.

Installation of basic Nvidia drivers

The Nvidia graphics card must already be supported for regular X or Wayland services on a Linux system. CUDA and cuDNN come on top.

Note: You need a fairly new Nvidia driver for CUDA-12.3 ! To get the latest drivers for an Opensuse system I install the proprietary Nvidia drivers from the Opensuse’s Nvidia repository:

Nvidia Repository address for Leap 15.5: https://download.nvidia.com/opensuse/leap/15.5

Note that presently YaST2 has a bug (see here). You may need to use zypper on the command-line to this repository to your package manager. See the man pages for zypper for the right syntax. IN the end you should see the Nvidia repository in YAST2:

Read More »Installation of CUDA 12.3 and CuDNN 8.9 on Opensuse Leap 15.5 for Machine Learning

Using PyQt with QtAgg in Jupyterlab – IV – simple PyQt and MPL application with background worker and receiver threads

As you read this post you are probably interested in Machine Learning [ML] and hopefully in Linux systems as a ML-platform as well. This post series wants to guide you over a bridge between the standard tool-set of Python3 notebooks in Jupyterlab for the control of ML-algorithms and graphical Qt-applications on your Linux desktop. The objective is to become more independent of some limitations of the browser based Jupyterlab notebooks.

One aspect is the use of graphical Qt-based control elements (as e.g. buttons, etc.) in desktop windows. On the other hand we want to use background threads to produce (ML) data which we later, e.g. during training runs, display in Qt windows. Background threads will also enable us to run smaller code in other cells of our notebook during long ML-runs. We are also confident that we can keep up the interactivity of both our Qt windows and Jupyterlab during such runs.

We will later use the callback machinery of Keras based ML-runs to produce ML-data and other information about a running ML-algorithm in the background of Jupyterlab. These data will be sent to Matplotlib- and Qt callback-functions in Jupyterlab which then update Qt windows.

Knowledge gained so far …

During the previous posts we have gathered enough information to now build an example PyQt application, which utilizes two background threads.

We have seen that QtAgg, a backend bridge for producing Matplotlib [MPL] plots in Qt windows, can be used for full fledged PyQt applications, too. In the first post we became familiar with some useful Qt-widgets and the general structure of Qt-Apps.

In the 2nd and 3rd posts we have learned that both Matplotlib figures and Qt-widgets must be controlled by the main thread associated with our Jupyterlab notebook. A Qt event loop is started in this thread by QtAgg for us. We have also noted that background threads controlled by QThread-objects can send signals which end up serialized in the Qt event queue of the main thread. From there they can be handled asynchronously, but in timely order by callbacks, which in turn update Qt-widgets for MPL-plots and other information. The 3rd post discussed a general pattern to employ both a raw data producing worker thread and a receiver thread to prepare the data for eventual foreground handling.

Objective of this post

In this post I will discuss a simple application that produces data with the help of two background threads according to the pattern discussed in the previous post. All data and information will periodically be sent from the background to callbacks in the main thread. Although we only use one main Qt window the structure of the application includes all key elements to serve as a blueprint for more complex situations. We will in particular discuss how to stop the background jobs and their threads in a regular way. An interesting side topic will be how one captures print output to stdout from background jobs.

Level of this post: Advanced. Some experience with Jupyterlab, QtAgg, Matplotlib and (asynchronous) PyQt is required. The first three posts of this series provide (in my opinion) a quick, though steep learning curve for PyQt newbies.

Application elements

Our PyQt application will contain three major elements in a vertical layout:

  • Two buttons to start and stop two background threads. These threads provide data for a sine-curve with steadily growing frequency and some related information text.
  • A Qt-widget for a Matplotlib figure to display the changing sine curve.
  • A series of QTextEdit widgets to display messages from the background and from callbacks in the foreground.

Our pattern requires the following threads: A “worker thread” periodically creates raw data and puts them into Python queues. A “receiver thread” reads out the queues and refines the data.

In our case the receiver thread will add additional information and data. Then signals are used to communicate with callbacks in the main thread. We send all data for widget and figure updates directly with the signals. This is done for demonstration purposes. We could also have used supplemental data queues for the purpose of inter-thread data exchange. For plotting we use Matplotlib and the related Figure.canvas-widget provided by QtAgg.

So, we have a main thread with a Qt event loop (and of course a loop for Jupyterlab REPL interaction) and two background threads which perform some (simple) asynchronous data production for us.

Our challenge: Qt and Matplotlib control with Python code in a Jupyterlab notebook

The application looks pretty simple. And its structure will indeed be simple. However, as always the devil is an expert for faults in details. In our particular situation with Jupyterlab we need to get control over the following tasks:

  • setup and start of two background threads – a worker thread and a receiver thread,
  • association of worker and receiver objects to the named threads with a respective affinity,
  • asynchronous inter-thread communication and data exchange via signals,
  • updates of Qt-widgets and integrated Matplotlib figures,
  • spinning the Qt-event-loop in the main thread to ensure quick widget updates,
  • a regular stop of thread activities and a removal of thread-related objects,
  • checking interactivity of both the Jupyterlab and the Qt-interface,
  • stability of the plot-production against potentially conflicting commands from the main thread.

All via code executed in cells of a Python notebook. An additional topic is:

  • capturing print-commands in the background and transmission of the text to the foreground.
Read More »Using PyQt with QtAgg in Jupyterlab – IV – simple PyQt and MPL application with background worker and receiver threads

Using PyQt with QtAgg in Jupyterlab – III – a simple pattern for background threads

We can use PyQt to organize output of Machine Learning applications in Qt-windows outside of Jupyterlab notebooks on a Linux desktop. PyQt also provides us with an option to put long running Python code as ML training and evaluation runs into the background of Jupyterlab and redirect graphical and text output to elements of Qt windows. Moving long lasting Python jobs and ML algorithms to the background of Jupyterlab would have the advantages

  • that we could run short code segments in other notebook cells in the meantime
  • and keep up the responsiveness of PyQt and Qt-based Matplotlib windows on the desktop.

In the first two posts of this series

we saw that PyQt and its GUI-widgets work perfectly together with Matplotlib’s backend QtAgg. Matplotlib figures are actually handled as special Qt widgets by QtAgg. We also gathered some information on threads in relation to Python and (Py)Qt. We understood that all (Py)Qt-GUI-classes and widgets must be run in the main thread of Jupyterlab and that neither Qt-widgets nor Matplotlib functions are thread-safe.

As a consequence we need some thread-safe, serializing communication method between background threads and the main thread. Qt-signals are well suited for this purpose as they end up in the event queue of target threads with fitting slots and respective functions. The event queue and the related event loop in the main thread of a Qt application enforce the required serialization for our widgets and Matplotlib figures.

In this post I want to discuss a simple pattern of how to put workload for data production and refinement into the background and how to trigger the updates of graphical PyQt windows from there. The pattern is based on elements discussed in the 2nd post of this series.

Pattern for the interaction of background threads with Qt objects and widgets in the foreground

You may have read about various thread-related patterns as the producer/consumer pattern or the sender/receiver pattern.

It might appear that the main thread of a Jupyter notebook with an integrated main Qt event loop would be a natural direct consumer or receiver of data produced in the background for graphical updates. One could therefore be tempted to think of a private queue as an instrument of serialization which is read out periodically from an object in the main thread.

However, what we cannot do is to run a loop with a time.sleep(interval)-function in a notebook cell in the main thread for periodic queue handling. The reason is that we do not want to block other code cells or the main event loop in our Python notebook. While it is true that time.sleep() suspends a thread, so another thread can run (under the control of the GIL), the problem remains that within the original thread other code execution is blocked. (Actually, we could circumvent this problem by utilizing asyncio in a Jupyterlab notebook. But this is yet another pattern for parallelization. We will look at it in another post series.)

Now we have two options:

  1. We may instead use the particular queue which is already handled asynchronously in Jupyterlab – namely the event queue started by QtAgg. We know already that signals from secondary (QThread-based) threads are transformed into Qt-events. We can send relevant data together with such signals (events) from the background. They are placed in the main Qt event queue and dispatched by the main event loop to callbacks.
  2. If we instead like to use a private queue for data exchange between a background and the main thread we would still use signals and respective slot functions in the main thread. We access our queue via a slot’s callback and read-out only one or a few new entries from there and work with them.

I will use the second option for the exchange of larger data objects in another post in this series. The pattern discussed in this post will be build upon the first option. We will nevertheless employ our own queue for data exchange – but this time between two threads in the background.

Short running callbacks in the main thread

According to what we learned in the last post, we must take care of the following:

The code of a callback (as well as of event handlers) in the main thread should be very limited in time and execute as fast as possible to create GUI updates.

Otherwise we would block the execution of main event loop by our callback! And that would render other graphical objects on the desktop or in the notebook unresponsive. In addition it would also block running code in other cells.

This is really an important point: The integration of Qt with Jupyterlab via a hook for handling the the Qt main event loop seemingly in parallel to IPython kernel’s prompt loop is an important feature which guarantees responsiveness and which we do not want to spoil by our background-foreground-interaction.

This means that we should follow some rules to keep up responsiveness of Jupyterlab and QT-windows in the foreground, i.e. in the main thread of Jupyterlab:

  • All data which we want to display graphically in QT windows should already have been optimally prepared for plotting before the slot function uses them for QT widget or Matplotlib figure updates.
  • Slot functions (event handlers) should use the function Qwidget.Qapplication.process_events()
    to intermittently spin the event-loop for the update of widgets.
  • The updates of PyQt widgets should only periodically be triggered via signals from the background. The signals can carry the prepared data with them. (If we nevertheless use a private queue then the callback in the main thread should only perform one queue-access via get() per received signal.)
  • The period by which signals are emitted should be relatively big compared to the event-loop timing and the typical processing of other events.
  • We should separate raw data production in the background from periodic signal creation and the related data transfer.
  • Data production in the background should be organized along relatively small batches if huge amounts of data are to be processed.
  • We should try to circumvent parallelization limitations due to the GIL whenever possible by using C/C++-based modules.

In the end it is all about getting data and timing right. Fortunately, the amount of data which we produce during ML training runs, and which we want to display on some foreground window, is relatively small (per training epoch).

A simple pattern for background jobs and intermediate PyQt application updates

An object or function in a “worker thread” calculates and provides raw data with a certain production rate. These data are put in a queue. An object or function in a “receiver thread” periodically reads out the next entries in the queue. The receiver knows what to do with these data for plotting and presentation. It handles them, modifies them if necessary and creates signals (including some update data for PyQt widgets). It forwards these signals to a (graphical) application in the main foreground thread. There they end up as events in the Qt event queue. Qt handles respective (signal-) events by so called “slots“, i.e. by callbacks for the original signals. The PyQt- application there has a graphical Qt-window that visualizes (some of) the data.

Read More »Using PyQt with QtAgg in Jupyterlab – III – a simple pattern for background threads

ResNet basics – I – problems of standard CNNs

Convolutional Neural Networks [CNNs] do a good job regarding the analysis of image or video data. They extract correlation patterns hidden in the numeric data of our media objects, e.g. images or videos. Thereby, they get an indirect access to (visual) properties of displayed physical objects – like e.g. industry tools, vehicles, human faces, ….

But there are also problems with standard CNNs. They have a tendency to eliminate some small scale patterns. Visually this leads to smoothing or smear-out effects. Due to an interference of applied filters artificial patterns can appear when CNN-based Autoencoders shall recreate or generate images after training. In addition: The number of sequential layers we can use within a CNN on a certain level of resolution is limited. This is on the one hand due to the number of parameters which rises quickly with the number of layers. On the other hand and equally important vanishing gradients can occur during error-back-propagation and cause convergence problems for the usually applied gradient descent method during training.

A significant improvement came with so called Deep Residual Neural Networks [ResNets). In this post series I discuss the most important differences in comparison to standard CNNs. I start the series with a short presentation of some important elements of CNNs. In a second post I will directly turn to the structure of the so called ResNet V2-architecture [2].

To get a more consistent overview over the historical development I recommend to read the series of original papers [1], [2], [3] and a chapter in the book of R. Atienza [4] in addition. This post series only summarizes and comments the ideas in the named resources in a rather personal way. For me it serves as a preparation and overall documentation for Python programs. But I hope the posts will help some other readers to start working with ResNets, too. In a third post I will also look at the math discussed in some of the named papers on ResNets.

Level of this post: Advanced. You should be familiar with the concepts of CNNs and have some practical experience with this type of Artificial Neural Network. You should also be familiar with the Keras framework and standard layer-classes provided by this framework.

Elements of simple CNNs

A short repetition of a CNN’s core elements will later help us to better understand some important properties of ResNets. ResNets are in my opinion a natural extensions to CNNs and will allow us to build really deep networks based on convolutional layers. The discussion in this post focuses on simple standard CNNs for image analysis. Note however, that the application spectrum is much broader, 1D-CNNs can for example be used to detect patterns in sequential data flows as texts.

We can use CNNs to detect patterns in images that depict objects belonging to certain object-classes. Objects of a class have some common properties. For real world tasks the spectrum of classes must of course be limited.

The idea is that there are detectable patterns which are characteristic of the object classes. Some people speak of characteristic features. Then the identification of such patterns or features would help to classify objects on a new images a trained CNN gets confronted with. Or a combination of patterns could help to recreate realistic object images. A CNN must therefore provide not only some mechanism to detect patterns, but also a mechanism for an internal pattern representation which can e.g. be used as basic information for a classification task.

We can safely assume that the patterns for objects of a certain class will show specific structures on different length scales. To cover a reasonable set of length scales we need to look at images at different levels of resolution. This is one task which a CNN must solve; certain elements of its layer architecture must ensure a systematic change in resolution and of the 2-dimensional length scales we look at.

Pattern detection itself is done by applying filters on sub-scales of the spatial dimensions covered by a certain level of resolution. The filtering is done by so called “Convolutional Layers“. A filter tests the overlap of a given object’s 2-diimensional structures with some filter-related periodic pattern on smaller scales. Relevant filter-parameters for optimal patterns are determined during the training of a CNN. The word “optimal” refers to the task the CNN shall eventually solve.

The basic structure of a CNN (e.g. to analyze the MNIST dataset) looks like this:

The sketched simple CNN consists of only three “Convolutional Layers”. Technically, the Keras framework provides a convolutional layer suited for 2-dimensional tensor data by a Python class “Conv2D“. I use this term below to indicate convolutional layers.

Each of our CNN’s Conv2D-layers comprises a series of rectangular arrays of artificial neurons. These arrays are called “maps” or sometimes also “feature maps“.

All maps of a Conv2D-layer have the same dimensions. The output signals of the neurons in a map together represent a filtered view on the original image data. The deeper a Conv2D-layer resides inside the CNN’s network the more filters had an impact on the input and output signals of the layer’s maps. (More precisely: of the neurons of the layer’s maps).

Resolution reduction (i.e. a shift to larger length scales) is in the depicted CNN explicitly done by intermittent pooling-layers. (An alternative would be that the Conv2D-layers themselves work with a stride parameter s = 2; see below.) The output of the innermost convolution layer is flattened into a 1-diemsnional array, which then is analyzed by some suitable sub-network (e.g. a tiny MLP).

Filters and kernels

Convolution in general corresponds to applying a (sub-scale) filter-function to another function. Mathematically we describe this by so called convolution integrals of the functions’ product (with a certain way of linking their arguments). A convolution integral measures the degree of overlap of a (multidimensional) function with a (multidimensional) filter-function. See here for an illustration.

As we are working with signals of distinct artificial neurons our filters must be applied to arrays of discrete signal values. The relevant arrays our filters work upon are the neural maps of a (previous) Conv2D-layer. A sub-scale filter operates sequentially on coherent and fitting sub-arrays of neurons of such a map. It defines an averaged signal of such a sub-array which is fed into a neuron of map located in the following Conv2D-layer. By sub-scale filtering I mean that the dimensions of the filter-array are significantly smaller than the dimensions of the tested map. See the illustration of these points below.

The sub-scale filter of a Conv2D-layer is technically realized by an array of fixed parameter-values, a so called kernel. A filter’s kernel parameters determine how the signals of the neurons located in a covered sub-array of a map are to be modified before adding them up and feeding them into a target neuron. The parameters of a kernel are also called a filter’s weights.

Geometrically you can imagine the kernel as an (k x k)-array systematically moved across an array of [n x n]-neurons of a map (with n > k). The kernel’s convolution operation consists of multiplying each filter-parameter with the signal of the underlying neuron and afterward adding the results up. See the illustration below.

For each combination of a map M[N, i] of a layer LN with a map M[N+1, m] of the next layer L(N+1) there exists a specific kernel, which sequentially tests fitting sub-arrays of map M[N, i] . The filter is moved across map M[N, i] with a constant shift-distance called stride [s]. When the end of a row is reached the filter-array is moved vertically down to another row at distance s.

Note on the difference of kernel and map dimensions: The illustration shows that we have to distinguish between the dimensions of the kernel and the dimensions of the resulting maps. Throughout this post series we will denote kernel dimensions in round brackets, e.g. (5×5), while we refer to map dimensions with numbers in box brackets, e.g. [11×11].

In the image above map M[N, i] has a dimension of [6×6]. The filter is based on a (3×3) kernel-array. The target maps M[N+1, m] all have a dimension of [4×4], corresponding to a stride s=1 (and padding=”valid” as the kernel-array fits 4 times into the map). For details of strides and paddings please see [5] and [6].

Whilst moving with its assigned stride across a map M[N, i] the filter’s “kernel” mathematically invokes a (discrete) convolutional operation at each step. The resulting number is added to the results of other filters working on other maps M[N, j]. The sum is fed into a specific neuron of a target map M[N+1, m] (see the illustration above).

Thus, the output of a Conv2D-layer’s map is the result of filtered input coming from previous maps. The strength of the remaining average signal of a map indicates whether the input is consistent with a distinct pattern in the original input data. After having passed all previous filters up to the length scale of the innermost Conv2D-layer each map reacts selectively and strongly to a specific pattern, which can be rather complex (see pattern examples below).

Note that a filter is not something fixed a priori. Instead the weights of the filters (convolution kernels) are determined during a CNN’s training and weight optimization. Loss optimization dictates which filter weights are established during training and later used at inference, i.e. for the analysis of new images.

Note also that a filter (or its mathematical kernel) represents a repetitive sub-scale pattern. This leads to the fact that patterns detected on a specific length scale very often show a certain translation and a limited rotation invariance. This in turn is a basic requirement for a good generalization of a CNN-based algorithm.

A filter feeds neurons located in a map of a following Conv2D-layer. If a layer N has p maps and the following layer has q maps, then a neuron of a map M[N+1, m] receives the superposition of the outcome of (p*q) different filters (and respective kernels).

Patterns and features

Patterns which fit some filters, of course appear on different length scales and thus at all Conv2D-layers. We first filter for small scale patterns, then for (overlayed) patterns on larger scales. A certain combination of patterns on all length scales investigated so far is represented by the output of the innermost maps.

All in all the neural activation of the maps at the innermost layers result from (surviving) signals which have passed a sequence of non-linearly interacting filters. (Non-linear due to the non-linearity of the neurons’ activation function.) A strong overall activation of an innermost map corresponds to a unique and characteristic pattern in the input image which “survived” the chain of filters on all investigated scales.

Therefore a map is sometimes also called a “feature map”. A feature refers to a distinct (and maybe not directly visible) pattern in the input image data to which an inner map reacts strongly.

Increasing number of maps with lower resolution

When reducing the length scales we typically open up space for more significant pattern combinations; the number of maps increases with each Conv-layer (with a stride s=2 or after a pooling layer). This is a very natural procedure due to filter combinatorics.

Examples of patterns detected for MNIST images

A CNN in the end detects and establishes patterns (inherent in the image data) which are relevant for solving a certain problem (e.g. classification or generative reconstruction). A funny thing is that these “feature patterns” can be visualized.

The next image shows the activation pattern (signal strengths) of the neurons of the 128 (3×3) innermost maps of a CNN that had been trained for the MNIST data and was confronted with an image displaying the digit “6”.

The other image shows some “featured” patterns to which six selected innermost maps react very sensitively and with a large averaged output after having been a trained on MNIST digit images.

 

These patterns obviously result from translations and some rotation of more elementary patterns. The third pattern seems to useful for detecting “9”s at different positions on an image. The fourth pattern for the detection of images of “2”s. It is somewhat amusing of what kind of patterns a CNN thinks to be interesting to distinguish between digits!

If you are interested of how to create images of patterns to which the maps of the innermost Conv2D-layer reacts to see the book of F. Chollet on “Deep Learning with Python” [5]. See also a post of the physicist F. Graetz “How to visualize convolutional features in 40 lines of code” at “towardsdatascience.com”. For MNIST see my own posts on the visualization of filter specific patterns in my linux-blog. I intend to describe and apply the required methods for layers of ResNets somewhere else in the present ML-blog.

Deep CNNs

The CNN depicted above is very elementary and not a really deep network. Anyone who has experimented with CNNs will probably have tried to use groups of Conv2D-layers on the same level of resolution and map-dimensions. And he/she probably will also have tried to stack such groups to get deeper networks. VGG-Net (see the literature, e.g. [2, 5, 6] ) is a typical example of a deeper architecture. In a VGG-Net we have a group of sequential Conv2D layers on each level of resolution – each with the same amount of maps.

BUT: Such simple deep networks do not always give good results regarding both error rates, convergence and computational time. The number of parameters rises quickly without a really satisfactory reward.

Problems of deep CNNs

In a standard CNN each inner convolutional layer works with data that were already filtered and modified by previous layers.

An inner filter can not adapt to original data, but only to filtered information. But any filter does eliminate some originally present information … This also occurs at transitions to layers working on larger dimensions, i.e. with maps of reduced resolution: The first filter working on a larger length scale (lower resolution) eliminates information which originally came from a pooling layer (or a Conv2D-layer with stride=2). The original averaged data are not available to further layers working on the same new length scale.

Therefore, a standard CNN deals with a problem of rather fast information reduction. Furthermore, the maps in a group have no common point of reference – except overall loss optimization. Each filter can eventually only become an optimal one if the previous filtering layer has already found a reasonable solution. An individual layer cannot learn something by itself. This in turn means: During training the net must adapt as a whole, i.e. as a unit. This strong coupling of all layers can enhance the number of training epochs and also create problems of convergence.

How could we work against these trends? Can we somehow support an adaption of each layer to unfiltered data – i.e. support some learning which does not completely dependent on previous layers? This is the topic of the next post in this series.

Conclusion

CNNs were building blocks in the history of Machine Learning for image analysis. Their basic elements as Conv2D-layers, filters and respective neural maps on different length scales (or resolutions) work well in networks whose depth is limited, i.e. when the total number of Conv2D-layers is small (3 to 10). The number of parameters rises quickly with a network’s depth and one encounters convergence problems. Experience shows that building really deep networks with Conv2D-layers requires additional architectural elements and layer-combinations. Such elements are cornerstones of Residual Networks. I will discuss them in the next post of this series. See

ResNet basics – II – ResNet V2 architecture

Literature

[1] K. He, X. Zhang, S. Ren , J. Sun, “Deep Residual Learning for Image Recognition”, 2015, arXiv:1512.03385v1
[2] K. He, X. Zhang, S. Ren , J. Sun, “Identity Mappings in Deep Residual Networks”, 2016, version 2 arXiv:1603.05027v2
[3] K. He, X. Zhang, S. Ren , J. Sun, “Identity Mappings in Deep Residual Networks”, 2016, version 3, arXiv:1603.05027v3
[4] R. Atienza, “Avanced Deep Learning with Tensorflow 2 and Keras”, 2nd edition, 202, Packt Publishing Ltd., Birmingham, UK (see chapter 2)
[5] F. Chollet, “Deep Learning with Python”, 2017, Manning Publications, USA
[6] A. Geron, “Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow”, 3rd ed., 2023, O’ReillyMedia Inc., Sebastopol, USA CA