As someone also building constrained decoders against JSON [1], I was hopeful to see the same but I note the following from their documentation:
The model can choose to call a function; if so, the content will be a stringified JSON object adhering to your custom schema (note: the model may generate invalid JSON or hallucinate parameters).
So sadly, it is just fine tuning. There's no hard biasing applied :(. You were so close, but so far OpenAI!
Good point. Backtracking is certainly possible but it is probably tricky to parallelize at scale if you're trying to coalesce and slam through a bunch of concurrent (unrelated) requests with minimal pre-emption.
[1] https://github.com/newhouseb/clownfish
[2] https://platform.openai.com/docs/guides/gpt/function-calling