Does that entire model fit in gpu memory? How's it run?
I tried running a model larger than ram size and it loads some layers into the gpu but offloads to the cpu also. It's faster than cpu alone for me, but not by a lot.
Nice, last time I tried out ROCm on Arch a few years ago it was a nightmare. Glad to see it's just one package install away these days, assuming you didn't do any setup beforehand.
Oooh this is great, thank you. I've been wanting to set up something like this for quite a while and haven't really spent the time to figure out how I'd do it. Glad to have an option just land on my screen like this! Cheers :)
Also just realized there are one or two other services they could redirect (e.g. Medium -> scribe.rip). Will see if it's feasible for them to easily add...
1. sudo pacman -S ollama-rocm
2. ollama serve
3. ollama run deepseek-r1:32b