That’s an awfully specific and esoteric question. Would you expect gpt4 to be significantly better at that level of depth? That’s not been my experience.
OK, i have to admit that one was a little odd, I was beginning to give up and trying new angles. I can't really share my other sessions. But I was trying to get a handle on the language and thought it would be an easily-understood situation (multiple-token auth). I would have at least expected the response to be slightly valid.
The language in question was only open sourced after GPT4's training date, so i couldn't compare. That's actually why I tried it in the first place. And yes, I do expect it to be better - GPT4 isn't perfect but I don't really it ever hallucinating quite that hard. In fact, its answer was basically that it didn't know.
And when I asked it questions with other, much less esoteric code like "how would you refactor this to be more idiomatic?" I'd get either "I couldn't complete your request. Rephrase your prompt and try again." or "Sorry, I can't help with that because there's too much data. Try again with less data." GPT-4 was helpful in both cases.
My experience has been that gpt4 will happily hallucinate the details when I go too deep. Like you mentioned, it will invent new syntax and function calls.
edit: as pointed out, this was indeed a pretty esoteric example. But the rest of my attempts were hardly better, if they had a response at all.