Criteria for ML capable graphic cards: Amount of VRAM or raw GPU power?

Some of my readers may be interested in having a private environment to study Machine Learning [ML] techniques and perform experiments with complex Neural Network algorithms. I do not talk about AI professionals, but about people (as myself) who are students or privately interested in ML-techniques. And about people who have a limited budget for their AI and ML interests.

Even if you are not a professional you sooner or later may find that a new and better suited graphics card is required for your ML studies. As the prices for graphics cards of the monopolist in this market segment, namely Nvidia, still are extremely high the question may arise what your most important criterion for choosing a certain type of card should be.

In my opinion the most relevant criteria, one has to consider and weigh during a buyer decision, are:

The price level (I avoid adjectives as “reasonable”, “relatively moderate” intentionally as Nvidia in my opinion uses its monopoly position to make a maximum profit.)
The amount of available VRAM.
Raw GPU power and performance in terms of characteristic HW parameters as e.g. the GPU frequency. (But note that the performance of a certain ML algorithm may depend on many more parameters and should always be evaluated with the help of well defined test cases. VRAM and total turnaround performance of many ML algorithms may show a strong correlation.)
Energy consumption (which again has to do with a secondary price tag, namely that of running energy costs).

For private persons, who may have a very limited budget for their ML hobby, criterion 1 will always be dominant. But the variety of graphic cards available for a certain chip generation and the respective variation of HW properties and price tags is big. Most people would like to see criteria 2 and 3 being fulfilled at the same time. But you may find respective cards to be unaffordable. Criterion 4, in my opinion, often is totally underestimated.

With this post I want to briefly discuss criteria 2 to 4 and give you a recommendation regarding their relative weight.

Power consumption – the underestimated criterion

When you start performing training runs for modern Artificial Neural Networks [ANNs] you will soon learn that the GPU usage rises to above 90%. I have sometimes seen a permanent GPU load of 95%. When you watch the power consumption of your graphics card during such runs you may find that it also reaches above 85% to 90% of the nominal maximum power consumption value. Without any overclocking.

In 2020 I used a lot of my free time to work with different types of Deep Learning networks on a modest card namely a 960 GTX (4GB). As the performance of such a card is limited by around 160 Watts I was really astonished about my energy expenses at the end of the year: My ML interest increased my expenses in 2020 by about 30%. Which is more than significant. Taking into account that more powerful cards of each chip-generation may consume up to 300 Watt I would like to warn private ML enthusiasts:

Do not underestimate the energy consumption ML experiments may cause even on moderate graphics cards. ML is an expensive hobby for private addicts. Especially in countries like Germany where the price tag for electrical energy is higher than anywhere else in Europe.

Another important aspect is the rise in GPU core temperature. I have experienced peak values of the GPU temperature of more than 75° up to 80° Celsius. Combined with high fan rotation rates – and some respective noise. However, the more powerful a graphics card of a certain chip-generation is the more relevant the cooling and the associate noise problems become. So, if having a quiet, relatively cool system is a topic for you, a compromis regarding the performance level of a new GPU card may be appropriate from the very beginning.

VRAM vs. GPU performance

A RTX 4090 card with 24 GB VRAM may be something you dream of as a Linux PC user, but something you cannot afford. Then looking at the model palette of the 4090 chip series a serious question may arise: Should you focus on a cheaper model with less GPU power but more VRAM – or the other way round?

May advice is: It depends on your type of experiments, but in most cases and for the main purpose of studying various types of modern ANNS, GANs and Transfomers the size of the available VRAM is more important.

Why? Well, you may be able to await the result of an overnight calculation. But for really deep ANNs like some variants of CNNs, RNNs or other networks with many layers a lack of VRAM may render your planned experiments impossible. Even if you load your data during training and/or evaluation runs in really small batches. In any case you must have enough VRAM to keep the ANN’s model parameters and two or more batches within the available VRAM. Similar arguments hold for (transformer) networks handling texts and respective vector models. Even some steps for the preparation of texts may require a significant amount of VRAM.

In addition, VRAM and the total turnaround performance of many ML algorithms are not at all independent of each other. The more data you can keep in VRAM during your runs the better. Data transfer from and to the RAM is costly in terms of total turnaround time.

Note that there typically are two bath sizes which may become relevant: One determines how many data vectors are handled before updating your model parameters during training runs. The VRAM organization of a concrete tensor algorithm has some degrees of freedom, but in general this batch size will raise VRAM consumption. The other relevant batch size is that of packets during batched data transfer from the RAM (or disks) to the GPU’s VRAM. Depending on your PCIe bus width and the graphics card larger batches may have an additional impact on performance. Effectively transferring data from the RAM to the GPU and back often requires a delicate balance between system capabilities and the chosen transfer batch size. The latter will also raise the VRAM requirements.

So VRAM is at least as important as the raw GPU performance in terms of GPU core and VRAM frequencies. In most cases VRAM is even more important. For being able to test certain types of deep neural network types the amount of available VRAMt may become the dominant criterion which must be fulfilled for any kind of experiment.

Reasonable VRAM sizes for a start

I did my first ML experiments on a card with only 4GB of VRAM. You can do a lot with such a card. But the more you play around with deep and relatively modern ANNs the more painful it gets and the more time you must invest in programming tricks. But I would say: For a start graphic cards with 8GB and a GPU above the 960 GTX level are sufficient. If you really plan to study generative ML algorithms or really deep neural networks or NLP algorithms at least 16 GB of VRAM are a must.

Regarding price vs. VRAM: It may be more reasonable to buy two cards, each with 16GB VRAM and a less powerful GPU, than a most advanced GPU with 24GB VRAM.

Conclusion

Most often VRAM is more important than pure GPU performance – at least for people who want to study basic ML algorithms and ANN properties. Choosing less GPU power may also be consistent with reducing your system’s overall power consumption and its heat as well as its noise level.