Yeah, I was equally surprised. I am using a patched version of ollama to run the...

Yeah, I was equally surprised. I am using a patched version of ollama to run the models: https://github.com/austinksmith/ollama37 which has a trivial change to allow it to run with old versions of cuda (3.5, 3.7). Obviously this was before tensor cores were a thing, so you're not going to be blown away by the performance, but it was cheap. I got 3x k40s for $75 on ebay, they are passively cooled, so they do need to be in a server chassis.