Interesting, I didn't know this was so sought after. I actually have one for sal...

rhdunn · on Feb 26, 2024

The A100 is comparable to the 3090 but with more memory. The H100 is the one comparable to the 4090.

The advantage of these is the access to the larger memory. And they are able to be linked together such that they all share the same memory via NVLink. This makes them scalable for processing the large data and holding the models for the larger scale LLMs and other NN/ML based models.

adfbkandfionio · on Feb 26, 2024

>And they are able to be linked together such that they all share the same memory via NVLink. This makes them scalable for processing the large data and holding the models for the larger scale LLMs and other NN/ML based models.

GPUs connected with NVLink do not exactly share memory. They don't look like a single logical GPU. One GPU can issue loads or stores to a different GPU's memory using "GPUDirect Peer-To-Peer", but you cannot have a single buffer or a single kernel that spans multiple GPUs. This is easier to use and more powerful than the previous system of explicit copies from device to device, perhaps, but a far cry from the way multiple CPU sockets "just work". Even if you could treat the system as one big GPU you wouldn't want to. The performance takes a serious hit if you constantly access off-device memory.

NVLink doesn't open up any functionality that isn't available over PCIe, as far as I know. It's "merely" a performance improvement. The peer-to-peer technology still works without NVLink.

NVidia's docs are, as always, confusing at best. There are several similarly-named technologies. The main documentation page just says "email us for more info". The best online documentation I've found is in some random slides.

https://developer.nvidia.com/gpudirect

https://developer.download.nvidia.com/CUDA/training/cuda_web...

rhdunn · on Feb 26, 2024

Interesting. So that would mean that you would still need a 40 or 80 GB card to run the larger models (30B LLM, 70B LLM, 8x7B LLM) and perform training of them.

Or would it be possible to split the model layers between the cards like you can between RAM and VRAM? I suppose in that case each card would be able to evaluate the results of the layers in its own memory and then pass those results to the other card(s) as necessary.

cthalupa · on Feb 27, 2024

You don't need nvlink for inference with models that need to fit in multiple cards. I'm using a laptop with a 3080 ti mobile and a 3090 in an eGPU enclosure for running LLM models over 24GB.

tmaly · on Feb 26, 2024

Have you seen an actual A100?

They are massive, I can imagine them being comparable to a 3090 at all.

Feorn · on Feb 26, 2024

A reference 3090 is longer by 69mm, wider by 29mm, and thicker by a slot than a PCIe A100.

Though I think the comment you're replying to was talking about them both using the same Nvidia GPU architecture, Ampere.

siver_john · on Feb 26, 2024

As someone who used various types of GPUs in graduate school. For most simulations, and even machine learning (unless you need the VRAM) you are generally better off going with a consumer card. There is generally about the same number of CUDA cores and the higher clock speeds will generally net you better performance overall.

Simulations where this isn't true is any that need double floating point (which you previously were able to do in the Titan series of consumer-ish cards). And where it is super important for DL is the VRAM it allows you to use much larger models. Plus the added features of being able to string them together and share memory which is an important feature that has been left off consumer cards (honestly in a way that makes sense because SLI has been dumb for some time).

moondev · on Feb 26, 2024

How did you end up cooling it? I have an a40 and it's been interesting testing all kinds of methods, from two 40mm fans to a 3A 9030 centrifugal blower with 3d printed duct