Eh, what if it was trained on all the previous cases ever to have existed? I think it could be pretty good, as long as it detects novelty as a to flag and confirm error case.
That's not the point. LLMs work by predicting what text to generate next. It doesn't work by choosing facts, it works by saying the thing that sounds the most appropriate. That's why it's so confidently wrong. No amount of training will eliminate this problem: it's an issue with the architecture of LLMs today.
You could layer another system on top of the LLM generations that attempts to look up cases referenced and discards the response if they don't exist, but that only solves that particular failure mode.
There are other kinds of failures that will be much harder to detect: arguments that sound right but are logically flawed, lost context due to inability to read body language and tone of voice, and lack of a coherent strategy, to name a few.
All of these things could theoretically be solved individually, but each would require new systems to be added which have their own new failure modes. At our current technological level the problem is intractable, even for seemingly simple cases like this one. A defendant is better off defending themselves with their own preparation than they are relying on modern AI in the heat of the moment.
It’s bizarre that anyone that supposedly works in technology even thinks this is realistic. This betrays a large lack of knowledge of technology and a child like understanding of the legal system.
It fails at determining if a number is prime and provides bogus arguments to such effect. You think it would make sense for this to argue complex legal cases with strategy? This isn’t Go or chess.