13B in GPTQ 4bit has practically no quality loss and runs in 8GB. I get 8 tokens...

13B in GPTQ 4bit has practically no quality loss and runs in 8GB. I get 8 tokens/second on my laptop CPU and 4 tokens/second on my phone CPU.

Even 33B only needs 20GB of VRAM in GPTQ 4bit.

8bit has zero perplexity loss, so there's really no reason to run in 16bit.

Even a $200 P40 24GB is enough to run 33B at extremely high speeds in GPTQ 4bit.