I just managed to make Mistral NeMo 4bit QLoRA finetuning fit in under 12GB, so ...

I just managed to make Mistral NeMo 4bit QLoRA finetuning fit in under 12GB, so it fits in a free Google Colab with a Tesla T4 GPU! VRAM is shaved by 60% and finetuning is also 2x faster! Colab: https://colab.research.google.com/github/unslothai/studio/bl...