Sounds like as 1. XML is the cleanest/best quality training data (especially compared to PDF/HTML) 2. It follows that a user providing semantic tags in XML format can get best training alignment (hence best results). Shame they haven't quantified this assertion here.
Slinging doom and gloom on the internet seems like engagement-bait to me at this point. If the suppliers aren't increasing production, they clearly see something all these armchair doomers do not, I'm sure the prices will normalize back to "normal" levels sooner than people think.
reply