The academic work is pretty safe as long as it isn't productized. The open models have a prime facie case to stand on. Using output is okay if you aren't directly competing with openai, even according to their tos.
> (e) use Output (as defined below) to develop any artificial intelligence models that compete with our products and services. However, you can use Output to (i) develop artificial intelligence models primarily intended to categorize, classify, or organize data (e.g., embeddings or classifiers), as long as such models are not distributed or made commercially available to third parties and (ii) fine tune models provided as part of our Services
But then those models are possibly used downstream, EG for Mistral's "medium" API model (and many other startups).
I guess if its behind an API and no one discloses the training data, OpenAI can't prove anything? Even obvious GPTisms could ostensibly be from internet data.
> (e) use Output (as defined below) to develop any artificial intelligence models that compete with our products and services. However, you can use Output to (i) develop artificial intelligence models primarily intended to categorize, classify, or organize data (e.g., embeddings or classifiers), as long as such models are not distributed or made commercially available to third parties and (ii) fine tune models provided as part of our Services