Even 33B only needs 20GB of VRAM in GPTQ 4bit.
8bit has zero perplexity loss, so there's really no reason to run in 16bit.
Even a $200 P40 24GB is enough to run 33B at extremely high speeds in GPTQ 4bit.
Even 33B only needs 20GB of VRAM in GPTQ 4bit.
8bit has zero perplexity loss, so there's really no reason to run in 16bit.
Even a $200 P40 24GB is enough to run 33B at extremely high speeds in GPTQ 4bit.