Llamafile is great and love it. I run all my models using it and it’s super portable, I have tested it on windows and linux, on a powerful PC and SBC. It worked great without too my issues.
It takes about a month for the features from llama.cpp to trickle in. Also figuring the best mix of context length size to vram size to desired speed takes a while before it gets intuitive.
https://github.com/Mozilla-Ocho/llamafile