Skip to content

ML

Using PyQt with QtAgg in Jupyterlab – IV – simple PyQt and MPL application with background worker and receiver threads

As you read this post you are probably interested in Machine Learning [ML] and hopefully in Linux systems as a ML-platform as well. This post series wants to guide you over a bridge between the standard tool-set of Python3 notebooks in Jupyterlab for the control of ML-algorithms and graphical Qt-applications on your Linux desktop. The objective is to become more independent of some limitations of the browser based Jupyterlab notebooks.

One aspect is the use of graphical Qt-based control elements (as e.g. buttons, etc.) in desktop windows. On the other hand we want to use background threads to produce (ML) data which we later, e.g. during training runs, display in Qt windows. Background threads will also enable us to run smaller code in other cells of our notebook during long ML-runs. We are also confident that we can keep up the interactivity of both our Qt windows and Jupyterlab during such runs.

We will later use the callback machinery of Keras based ML-runs to produce ML-data and other information about a running ML-algorithm in the background of Jupyterlab. These data will be sent to Matplotlib- and Qt callback-functions in Jupyterlab which then update Qt windows.

Knowledge gained so far …

During the previous posts we have gathered enough information to now build an example PyQt application, which utilizes two background threads.

We have seen that QtAgg, a backend bridge for producing Matplotlib [MPL] plots in Qt windows, can be used for full fledged PyQt applications, too. In the first post we became familiar with some useful Qt-widgets and the general structure of Qt-Apps.

In the 2nd and 3rd posts we have learned that both Matplotlib figures and Qt-widgets must be controlled by the main thread associated with our Jupyterlab notebook. A Qt event loop is started in this thread by QtAgg for us. We have also noted that background threads controlled by QThread-objects can send signals which end up serialized in the Qt event queue of the main thread. From there they can be handled asynchronously, but in timely order by callbacks, which in turn update Qt-widgets for MPL-plots and other information. The 3rd post discussed a general pattern to employ both a raw data producing worker thread and a receiver thread to prepare the data for eventual foreground handling.

Objective of this post

In this post I will discuss a simple application that produces data with the help of two background threads according to the pattern discussed in the previous post. All data and information will periodically be sent from the background to callbacks in the main thread. Although we only use one main Qt window the structure of the application includes all key elements to serve as a blueprint for more complex situations. We will in particular discuss how to stop the background jobs and their threads in a regular way. An interesting side topic will be how one captures print output to stdout from background jobs.

Level of this post: Advanced. Some experience with Jupyterlab, QtAgg, Matplotlib and (asynchronous) PyQt is required. The first three posts of this series provide (in my opinion) a quick, though steep learning curve for PyQt newbies.

Application elements

Our PyQt application will contain three major elements in a vertical layout:

  • Two buttons to start and stop two background threads. These threads provide data for a sine-curve with steadily growing frequency and some related information text.
  • A Qt-widget for a Matplotlib figure to display the changing sine curve.
  • A series of QTextEdit widgets to display messages from the background and from callbacks in the foreground.

Our pattern requires the following threads: A “worker thread” periodically creates raw data and puts them into Python queues. A “receiver thread” reads out the queues and refines the data.

In our case the receiver thread will add additional information and data. Then signals are used to communicate with callbacks in the main thread. We send all data for widget and figure updates directly with the signals. This is done for demonstration purposes. We could also have used supplemental data queues for the purpose of inter-thread data exchange. For plotting we use Matplotlib and the related Figure.canvas-widget provided by QtAgg.

So, we have a main thread with a Qt event loop (and of course a loop for Jupyterlab REPL interaction) and two background threads which perform some (simple) asynchronous data production for us.

Our challenge: Qt and Matplotlib control with Python code in a Jupyterlab notebook

The application looks pretty simple. And its structure will indeed be simple. However, as always the devil is an expert for faults in details. In our particular situation with Jupyterlab we need to get control over the following tasks:

  • setup and start of two background threads – a worker thread and a receiver thread,
  • association of worker and receiver objects to the named threads with a respective affinity,
  • asynchronous inter-thread communication and data exchange via signals,
  • updates of Qt-widgets and integrated Matplotlib figures,
  • spinning the Qt-event-loop in the main thread to ensure quick widget updates,
  • a regular stop of thread activities and a removal of thread-related objects,
  • checking interactivity of both the Jupyterlab and the Qt-interface,
  • stability of the plot-production against potentially conflicting commands from the main thread.

All via code executed in cells of a Python notebook. An additional topic is:

  • capturing print-commands in the background and transmission of the text to the foreground.
Read More »Using PyQt with QtAgg in Jupyterlab – IV – simple PyQt and MPL application with background worker and receiver threads

The Meaning of Object Features in different ML-Contexts

When I gave a few introductory courses on basic Machine Learning [ML] algorithms in 2022, I sometimes ran into a discussion about “features“. The discussions were not only triggered by my personal definition, but also by some introductory books on ML the attendants had read. Across such textbooks, but even in a single book on ML the authors have a tendency to use the term “features” in different contexts of ML-algorithms and in particular Artificial Neural Networks [ANN]. Unfortunately, the meaning of the term is a bit different in the covered contexts. This can lead to some misunderstandings.

With this post I want to specify the most important contexts in which the term “feature” appears, comment on the differences and suggest some measures to distinguish a bit better.

Level of the post: Advanced. You should already be familiar with ML and ANNs, pattern detection and respective variable spaces .

Features in different contexts

In general a feature addresses some property of an object. One would think that an object of interest for some ML application can be described precisely enough by quantifying its relevant properties. How then can it be that object properties get a different meaning in different contexts? The following considerations help to understand why.

We need numeric object data as input for ML-algorithms. But do we always get a direct information about physical properties of an object? Or is this information about an important feature only indirectly accessible? In this context media may play a role. We also must take into account that the processes of a trained ML-algorithm typically map an object’s input data to a point in some abstract multidimensional space which is spanned by internal and abstract variables of the algorithm. These variables could also be regarded (abstract) “features” of an object. In addition ML-algorithms detect and extract (sometimes hidden) patterns in the input data of objects. Such a pattern is also often called a “feature” characterizing a whole class of objects.

Guided by these thoughts I distinguish the following four main contexts regarding different meanings of the term “feature“:

Context 1 – input and training data based on selected and quantified object properties
The first relevant context concerns the representation of an object in a useful way for a numerical ML-algorithm. A “feature” is a quantifiable property of a class of objects to which we want to apply an algorithm. We define a single object by an ordered array (= tensor) providing numeric values for a set of selected, relevant properties. Such an array represents our object numerically and can be used as input to a computer program, which realizes an ML-algorithm. If numeric values of the properties are available for a whole bunch of objects we can use them as training data for our algorithm.

Mathematically, we interpret a property as a variable which takes a specific value for a selected single object. Thus the numerical representation of an object requires a set of multiple variables. Therefore, we often present the available original training data of our objects as data points in a multidimensional space with an Euclidean coordinate system [ECS]. Each axis of the ECS represents one of our feature variables by which we describe our objects. Sometimes this space is called the (original) “feature space” of the objects. Actually, it is a space to represent numeric training data available for our objects.

Context 2 – object information embedded in the data of some medium
What set of “properties” is used to define quantified input data of objects often depends on the way or form by which we register information about our objects. During information gathering media (as images, videos, sound recordings, …) can play a decisive role.

Let us take an example: We may want to train an ML-algorithm to distinguish between classes of elephants, like to distinguish an African from an Indian elephant. But relevant data of a bunch of elephants may be available in the form of pictures, only. One image for each of the elephants. We may not have have any direct numeric data for an elephant’s properties like its length, height, weight, ear size, … The data of relevant physical properties of elephants would in our case be indirectly embedded in media data.

In such a case we would probably use pixel values as our training data. I.e., the “features” our ML-algorithm gets confronted with would be provided as arrays of pixel values – corresponding to one variable for each of the image ‘s color pixels. Yet, the objects we really are interested in would be the photographed elephants. Our algorithm should distinguish between (depicted) elephants just from analyzing a respective image. The distinctive features must then be evaluated indirectly.

Such a situation opens room for misunderstandings regarding the objects the ML-algorithm really deals with (see the discussion below).

Context 3 – patterns extracted from object data
A “feature” is also used as a term to qualify a pattern which a ML-algorithm may somehow have detected in and extracted from some original training data (by some tricky mathematical methods).

Such pattern-based “features” summarize correlations in the original training data. The detected patterns can be abstract ones or they may correspond to physical properties of the objects. These features may not have been directly referenced by the training data presented to the ML-algorithm, but could have been detected during the training process. E.g. by the evaluation of correlations.

In such a case these features were hidden in the training data. Think again of images of elephant faces for which the training data were chosen to be pixel values: A pattern-based “feature” a capable algorithm detects may then be something like an elephant’s “nose” or “trunk”. More precisely: a nose-like pattern of positional correlations of certain pixel values.

But in other cases the detected pattern-based features may relate to some correlations between data which correspond to no concrete single physical property, but to more or less abstract property relations. E.g., there could be a relation between the size of an elephant and a date of birth, because after some date the food was changed or a genetic modification overtook for a group of elephants.

Context 4 – features as abstract variables of latent representation spaces of objects
The internal processes of many ML-algorithms, especially neural networks, map the data points (or vector) representing objects in the variable space of the input data to data points (or vector) in an internal or latent representation space. A ML-algorithm, e.g. an ANN, can be regarded as a complicated function mapping a vector of a high dimensional vector space to a vector of another vector space which a typically lower number of dimensions.

In the case of ANNs these internal representation spaces relate to vectorized data which are produced by neurons of a special (flat) layer of neurons. Such a layer typically follows a sequence of other analyzing and processing layer and summarizes in a way the results. The output of each of the neurons in this special inner layer can be regarded as a variable for a vector component. The processed data for a specific object thus lead to specific values corresponding to data points in an abstract multidimensional space. If such data are externalized and not directly subject to further internal and classifying networks then we speak of an accessible latent space.

The variables that span an internal or latent object representation space are abstract ones – but they can sometimes also measure the overlap with patterns in physical properties of the objects. In the case of Convolutional Neural Networks [CNNs] an internal or latent representation space condenses information about detected patterns and the degree of overlap of a given object with such a pattern. In this sense internal or latent representation (vector) spaces may also represent secondary, pattern based object features in the sense of context 3.

An internal representation space for objects is in some ANN-contexts (especially regarding Natural Language Processing by ANNs) also called an “embedding space“. The difference in my understanding lies in the way the mapping of training data into a representational space is done: In the case of an embedding space mapping is done by neuron layers close to the input side of a neural network. I.e. the input data are first mapped to an internal representation space and are afterward processed by other network layers. The relevant network parameters for the initial mapping (= embedding) are achieved during training via some parameter optimization. In the case of a latent or inner representation space we instead use data produced by neurons which are members of some special inner layer following major processing layers (as e.g. convolutional or residual layers).

See a Wikipedia article about latent spaces which distinguishes between the “feature space” of context 1 and the “latent feature space” of context 4.

A topic for confusion

The example of image data of elephants makes it clear why we often must define precisely what we mean when we speak about “features” of “objects”. In particular, we must be careful to distinguish between media objects and objects indirectly presented by our media objects. We also must address patterns as particular features and internal object representations. Key questions are:

Do we speak of quantified physical and abstract features of the objects we are interested in? Or do media objects play a role whose features encapsulate the data of the really relevant objects? Or do we speak of patterns? Or do we refer to variables of internal or latent feature spaces?

One widespread source of confusion is that we confuse a media object and the object encoded in media data. We speak of “elephants” when the real objects a ML-algorithm is confronted with are the images of elephants. Then an algorithm classifying elephants on the basis of image data does not really distinguish between different classes of elephants (or other photographed objects). Instead it actually distinguishes between images with different kinds of pixel correlations. If we are lucky the detected pixel correlation patterns reflect some information about single feature or the combination of multiple (physical) features of elephants (or other imaged objects).

Note that the the interpretation of the input data and the latent data of an ML-algorithm would change substantially if we had not used images of elephants and respective pixel values as training data, but data directly quantifying physical properties of an elephant – as e.g. the length of its trunk – to define our “objects”.

But a ML-algorithm may also detect patterns which the human brain cannot even see in pictures of objects. Then the algorithm would work with features in context 2, 3, 4 for which we may not even have a name. The features at least in context 3 and 4 in the end are always abstract – and chosen by the algorithm under optimization criteria.

The interesting thing is that the feature variables chosen to be our training data may totally obscure the really relevant features and respective data of the described objects. If we gave a human being a series of pixel value data and did not show the respective image in the usual 2-dimensional and colored way, we would have enormous difficulties to extract patterns of the photographed elephants. This is exactly the situation an artificial neural network is confronted with.

Be more precise when describing what you mean by a feature

We can resolve some of the confusion about features by specifying more precisely what we talk about. Personally, I would like to completely drop the word “feature space” for the variable space of training and input data to a ML-algorithm. Regarding the training data the terms “input or training variables” and “variable space of training data” seem much more appropriate. If required we should at least speak of “training data features” or “input data features”.

Concerning context 2 we must clarify what the primary objects whose feature data we feed into an algorithm are – and what the secondary objects are and how their features are indirectly encoded in the primary objects. We must also say which kind of objects we are interested in. Such a clarifying distinction is a must in the context of media data.

Context 3 related features, i.e. patterns, are in my opinion a helpful construction, in particular for describing aspects of CNNs. But such features must clearly be characterized as detected (correlation) patterns in the original input data. It should also be said, in which way such a pattern-based feature impacts the output of the algorithm. In case of CNNs referring to “patterns of feature maps” could be helpful to indicate that certain (sub-) layers of a CNN react strongly to a certain type of input pattern.

Regarding “features” in context 4 I think that the differences between internal and latent data representation or between “embedded” or “latent” representation spaces are not really decisive. We can in general speak of a “latent space” when we mean a multidimensional space to which some operational processes of a trained ML-algorithm or ANN map input data of objects to. Regarding the variables defining the respective vector space I would prefer to talk of “related latent variables” and a respective “latent variable space”. If we absolutely must discuss “features” we, at least we should use the term “latent features”.

Conclusion

Referring to features during a discussion of ML-algorithms, their input, output and internal or latent object representation may cause trouble if the term is not explained precisely. There are at least four contexts in which the term “feature” has a different meaning. Sometimes it appears to be better to avoid the term at all and instead refer to the relevant mathematical variables. Careful use is of particular importance if we describe our objects of interest via media as e.g. images.

 

Preliminary test of a Nvidia RTX 4060 TI 16GB with neural networks

Recently I had the opportunity to test a Nvidia RTX 4060 TI (vendor: MSI, model:Ventus ) on my Linux system against a Geforce GTX 960.

For private consumers as me who are not interested in gaming, but in Machine Learning [ML] this type of card can be interesting. I name three reasons:

  • the price level
  • the amount of available VRAM (16 GB)
  • the power consumption (which again has to do with a price, namely the one you have to pay for energy).

Some of the readers of this blog may miss a criterion like “performance“. The reason is that I regard the VRAM criterion as more important as raw GPU power. I have commented on this in another post, already. See Criteria for ML capable graphic cards: Amount of VRAM or raw GPU power?

This post provides some preliminary impressions and measured performance factors comparing the RTX 4060 TI to a previously used Nvidia Geforce 960 GTX. The factors were derived from training runs of Convolutional Neural Networks and Autoencoders used for object classification on images and generative tasks. These ML-runs were also used to measure the maximum temperature level and subjectively compare the fan noise of the 4060 TI vs. the GTX 960. So, the difference to what you find on other sites comparing GPUs is that I focused on ML-related tests and not on video games or game specific benchmarks.

A compromise regarding the value for money – some theoretical values for 40XX-cards

Nvidia, in my opinion, exploits its monopoly regarding ML-capable cards maximally; so ML capable cards still are very expensive. Finding an affordable compromise requires to compare specifications of GPU variants. The basic specification data for the RTX 4060 TI 16GB (and other variants of the Ada Lovelace architecture) can be found here. Some data can also be seen in the following picture:

A 4060 TI can not provide the GPU performance of a 4070, 4080 or 4090. In the following comparisons of a few specifications I leave out the RTX 4090 as the top model with a price tag above 1700 € presently in Germany. The 4090 is a card for ML-professionals or rich enthusiasts, but not for a normal private consumer as myself.

VRAM: The RTX 4060 TI is one of the 40XX-cards which provides 16 GB VRAM. Also the RTX 4080 comes with 16 GB VRAM. So the RTX 4080 it is a direct competitor for the RTX 4060 TI 16GB regarding value for money. Note that there is also a 4060 TI variant available which only provides 8GB. The RTX 4070 TI has much in common with a 4080, but a lower amount of VRAM, namely 12GB. All in all the politics of Nvidia for the RTX 4070 (TI) is a bit questionable as it does not really address the requirements of ML-people. But the lack of VRAM on the RTX 4070 (TI) was criticized by the gamer community, too.

Price: The price tag of the 4060 TI is around and below 470 € (in Germany), presently (Oct. 2023). I.e. a RTX 4060 TI costs roughly less than 37% and 55% of what you have to pay for a RTX 4080 (1250 €) and RTX 4070 TI (880 €), respectively. I took the prices from the German Amazon site.

TDP: A RTX 4080 can draw up to 320 Watt in power consumption, a RTX 4070 TI up to 285 Watt and a RTX 4090 up to 450 Watt. The RTX 4060 TI, in contrast, requests only up to 165 Watt (nominal TDP).

Speed/Performance: According to published consumer and game-based benchmarks the RTX 4080 GPU is roughly a factor of 2 to 2.2 faster than the “RTX 4060 TI 16GB”. Consistently, the RTX 4080 has roughly by a factor of 2.25 more cores / tensor cores. The memory bandwidth of the RTX 4080 is 256 Bit vs. only 128 Bit for the 4060 TI. Note also that the PCIe Link Speed width for the 4060 TI is only x8, instead of x16 for a RTX 4070 TI or a RTX 4080. However, the memory clock speed of the RTX 4060 TI is more than twice as high as for the RTX 4080. A RTX 4070 TI appears to be around a factor of 1.6 faster than a RTX 4060 TI.

A 4060 TI is only around 20% effectively faster than its older counterpart, the RTX 3060 TI. But the RTX 3060 TI has a significantly higher power consumption (up to 200 Watt), only (GB VRAM and regarding supported standards and operations it is behind the 4060 TI.

Summary: Regarding specifications the RTX 4060 TI in comparison to a RTX 4070 TI and a RTX 4080 certainly is a compromise regarding performance vs. price tag. But:

  • Even with two 4060 TI you would be well below the price level of one RTX 4080. But with two 4060 TI cards you would get 32 GB of VRAM in total – which is a decisive factor for some ML experiments. As VRAM is a critical factor, two 4060 TI would also almost certainly be a better deal than one 4070 TI. So, if you are in a position where you start with ML think carefully. The option to extend your experiments onto a combination of 2 RTX 4060 TI is a relevant future option.
  • Regarding low power consumption and related heat and noise levels a single 4060 TI in theory is without match. For me as a becoming ML addict the question of power consumption is a decisive one – I do not want to care too much about cooling and my energy bill when the GPU is under load for some hours.

Performance of Machine Learning runs on a RTX 4060 TI vs. a GTX 960

My main interests to do some preliminary tests myself was what I would gain in comparison to my old GTX 960 (vendor Gigabyte) when I did some training runs for Neural networks, more specifically CNNs with 9 million up to 38 million parameters. My old GTX 960 ad 4 GB VRAM only. So, all experiments included image data transfers from the RAM to the GPU’s VRAM via an ImageDataGenerator()-batch-pipeline. An important factor for the performance, therefore, is that the RAM is big enough to contain all relevant image tensors – in my test cases between 60,000 and 200,000. I did not read any data from disk, but had them preloaded in the RAM.

Due to its small amount of VRAM The GTX 960 certainly is no reasonable card these days for really deep ML-networks and respective algorithms. It is also too slow for many kinds of experiments with transformers. However, the GTX 960 has a low TDP of 120 Watt.

Expectations from standard benchmarks: Regarding expectations for the difference in performance between a 960 GTX and a RTX 4060 TI you may have a look at test results at tomshardware.com: see here. From the numbers we find there one may expect the RTX 4060 TI to show a factor of 4.4 in performance gain vs. a GTX 960.

However, ML-tests involve different operations than video games, namely more complicated tensor operations. The ML-performance, therefore, depends on many factors – e.g. on the tensor framework used. Which in my case was Tensorflow2 with Keras as a frontend. The performance of Nvidia cards also depends on CUDA and cuDNN drivers (including optimized Linear Algebra libraries for Deep Neural Networks). While CUDA has a current version of 12.2 I have done my tests with the older version CUDA 11.2. The proprietary Nvidia driver version was 535.113.01. Data were measured via the nvidia-settings app and “watch -n1.0 nvidia-smi” on a terminal. The system was run under Opensuse Leap 15.4.

As heavy ML tests also involve intermittent data transfers between the GPU and the standard RAM the system’s PCIe-environment, the CPU and the RAM’s clock frequency also have an impact. So the numbers given below and evaluated on a system with a Z170 board and a i7-6700K processor may not indicate the achievable optimum.

Performance for ML test cases

I used around 5 different test cases either directly based on CNNs or using CNN-based Autoencoders to be trained for different ML tasks regarding image analysis and image classification as well as generative tasks, respectively. Different layer structures including normalization and drop-out layers were used. The numbers of parameters to be optimized were between 9.3 and 38 million. I used between 60,000 and 210,000 images per run. The batch size during training was limited to 128 image tensors first. The color images had a resolution of 96×96 px. Two of the test cases used almost 4.0 GB of the available VRAM on the GTX 960 at this batch size. For some of the tests I raised the batch size later to 256. This, of course, increased the VRAM usage by at least a factor of 2.

In all tests I found that the percentage of GPU usage rose above 93 % permanently on both cards.

The relevant turnaround times for my selected training and evaluation runs differed by a factor of roughly 3.5 and 3.8 between the 960 GTX and the RTX 4060 TI – i.e. the RTX 4060 TI is on average by a factor of 3.7 faster than a 960 GTX.

Raising the batch size to 256 tensors transferred via an ImageDataGenerator()-pipeline from the RAM to the GPU’s VRAM and being handled there as a step unit during my training epochs gave an additional rise in the performance of the 4060 TI: It became by a factor of roughly 4.0 to 4.2 faster than the 960 GTX. This is not a world, but certainly significant.

Interestingly, the VRAM consumption rose not only by a factor of 2, but by 2.5 in some test cases due to changing the batch size by a factor of 2. So, to gain a factor of 4 in performance in comparison to a GTX 960 you may have to use much more VRAM on the RTX 4060 TI than on the GTX 960.

So regarding reasonable ML-tests you may not gain more than a factor of 3.6 to 4.1 in effective performance by replacing a GTX 960 with a RTX 4060 TI.

Power consumption, GPU temperatures and fan noise

Energy consumption during KDE desktop usage: Regarding power consumption under normal Linux conditions I have very positive news: I work with a Linux based KDE desktop stretched across 3 screens (two with 2560×1440 px and one with 1920×1200 px resolution). The power consumption of the 4060 TI was only around 12.4 to 14.0 Watt during standard desktop operations. This is significantly less than for the 960 GTX, which used 30 to 32 Watt.

Energy consumption under full load: During my ML-experiments the 960 TX consumed 108 Watt on average whereas the 4060 TI consumed between 117 Watt and 125 Watt. A factor of 1.16 in energy consumption during load phases for a gain in speed of more than 3.6 and for a 4 times bigger VRAM is more than acceptable.

Level 4 was only reached temporarily during the ML-runs.

GPU temperature: The temperature of the 4060 TI (and the GTX 960) never exceeded a maximum level of 71° Celsius under full load. This was slightly lower than for the GTX 960 (73° Celsius).
Note, however, that these temperatures requires free space below the GPU fans. I.e. the next PCIe slot should not be used by cards with major dimensions and which produce much heat themselves. See a section below for more information.

Fan noise: As you see from the picture above the two fans in the MSI Ventus model do a proper job. The RPM never rose beyond 1875 – at room temperatures around 23° Celsius. The adaptive fan control works with a slight delay regarding the temperature rise vs. time. This appears to be reasonable as the temperature could drop again during a time slice. In Nov. 2023 new TI models with Torx fans will appear which may be even more effective.

Coming from a phase of low load the fans only start rotating when the GPU temperature reaches more than 60° Celsius. This guarantees a zero noise level under standard operation conditions.

Under standard usage conditions (KDE desktop, 3 attached screens) and dependent on the temperature in the case the GPU temperature without any active fan sometimes rose from 45° Celsius to 51° Celsius – but not more. With a bit of fan rotation at 1200 RPM the temperature at once went down to 32° Celsius. This is acceptable as my Alpenföhn CPU-cooler stretches down to 2 cm above the graphics card and my case is relatively densely packed with multiple HDs, SSDs, sound cards and a Raid controller.

Under stress conditions the GPU fans were only audible outside the PC case when I fully concentrated to hear them willingly. I could not hear no coil whine from the graphics card.


Handling, size and other aspects

Regarding height the graphic cards occupies the space of two PCIe slots. However, its length is with roughly 20 cm much less than that of the 960 GTX. The width is around 12 cm.

Important Warning: The compact size of the card comes with a disadvantage:
Another card at the next PCIe slot below may cover a lot of the GPU’s fan area. Which is not good for an efficient cooling of the GPU. I had to move other PCIe cards (in particular a heat producing raid controller) to other slots to free some space below the graphics card. I noticed a drop in GPU temperature under stress conditions of up to 7° Celsius afterward. So, this important for ML interested users.

Conclusion

The RTX 4060 TI with 16GB VRAM is not a dream card for ML-focused users. But it is a reasonable compromise and offers a lot of value for your money. The 16 GB VRAM are especially valuable to extend the range of ML-experiments beyond those which can be done on cards with only 4 GB or 8 GB VRAM. So it is even interesting for users of a RTX 3060 TI or a good old Geforce 1080 (TI) – although the improvement in performance will then be be less than a factor of 1.2 and 1.4, respectively. Users of old cards like a Geforce 960 GTX will experience a performance jump by at least a factor of 3.5 up to 4.2 regarding ML-tasks.

The relatively low power consumption of around 125 Watt and its silent operation – even under heavy load – are major plus points of the RTX 4060 TI. And there is the option to add a second 4060 TI card to your ML-system when prices drop.

Links

https://gpu.userbenchmark.com/Compare/Nvidia-RTX-4080-vs-Nvidia-RTX-4060-Ti/4138vs4149

https://versus.com/de/nvidia-geforce-rtx-4060-ti-16gb-vs-nvidia-geforce-rtx-4080-16gb