Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Welp, my data point of one shows you need more than 8 GB of vRam.

When I run mistral-chat with Nemo-Instruct it crashes in 5 seconds with the error: "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB. GPU"

This is on Ubuntu 22.04.4 with an NVIDIA GeForce RTX 3060 Ti with 8192MiB. I ran "nvidia-smi -lms 10" to see what it maxed out with, and it last recorded max usage of 7966MiB before the crash.



When I run mistral-chat on Ubuntu 22.04 after cleaning up some smaller processes from the GPU (like gnome-remote-desktop-daemon) I am able to start Mistral-Nemo 2407 and get a Prompt on RTX 4090, but after entering the prompt it still fails with OOM, so, as someone noted, it narrowly fits 4090.


Agreed, it narrowly fits on RTX 4090. Yesterday I rented an RTX 4090 on vast.ai and setup Mistral-Nemo-2407. I got it to work, but just barely. I can run mistral-chat, get the prompt, and it will start generating a response to the prompt after 10 to 15 seconds. The second prompt always causes it to crash immediately from OOM error. At first I almost bought an RTX 4090 from Best Buy, but it was going to cost $2,000 after tax, so I'm glad that instead I only spent 40 cents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: