Mixture of Experts. Llama 3.1 405B is a dense model, so to evaluate the next tok...

Mixture of Experts. Llama 3.1 405B is a dense model, so to evaluate the next token, the context has to go through literally every parameter in its neural network. Whereas with mixture of experts, it's usually like a sixth to a tenth or even less of the neural network parameters that actually get evaluated for every token. Also, they don't use GPT-4 or 4.5 anymore iirc, which may have been dense (and that's why they were so expensive), 4.1 and 4o are much different models.