It seemingly is but it is also one of the most popular ways that public models are being updated with SFT. "Trained using ChatGPT output" is something blatantly advertised in the literature of some fairly popular "open" models. I'm not sure where that's headed legally but it's surprisingly common.
The academic work is pretty safe as long as it isn't productized. The open models have a prime facie case to stand on. Using output is okay if you aren't directly competing with openai, even according to their tos.
> (e) use Output (as defined below) to develop any artificial intelligence models that compete with our products and services. However, you can use Output to (i) develop artificial intelligence models primarily intended to categorize, classify, or organize data (e.g., embeddings or classifiers), as long as such models are not distributed or made commercially available to third parties and (ii) fine tune models provided as part of our Services
But then those models are possibly used downstream, EG for Mistral's "medium" API model (and many other startups).
I guess if its behind an API and no one discloses the training data, OpenAI can't prove anything? Even obvious GPTisms could ostensibly be from internet data.