You can fit the weights + a tiny context window into 24GB, absolutely. But you can't fit anything of any reasonable size. Or Ollama's implementation is broken, but it needs to be restricted beyond usability for it not to freeze up the entire machine when I last tried to use it.
LLMs do typically encode a confidence level in their embeddings, they just never use it when asked. There were multiple papers on this a few years back and they got reasonable results out of it. I think it was in the GPT3.5 era though
Are you seriously comparing discrimination based on factors noone can control to a group literally defined by a choice they made? And you think that's a good faith argument?
People hate C because it's hard, people hate C++ because it truly is rubbish. Rubbish that deserved to be tried but that we've now learned was a mistake and should move on from.
reply