> We cannot test every operation on every possible combinations of numbers. You ...

> We cannot test every operation on every possible combinations of numbers.

You can spot-check with a survey of random samples. That’s also how we often test humans in their abilities.

What's interesting is that the quality of the answer when asking an LLM to explain arithmetics and asking it to perform arithmetics don’t seem to be necessarily correlated. I.e. an LLM might be able to perfectly explain arithmetics but completely fail at performing it.

In humans, we don’t expect this to be the case (although there are examples of the opposite case with idiots savants, or, to a lesser degree, with children who might be able to perform some task but not explain it).

This disconnect in LLMs is one of the most important differences to human intelligence, it would seem.