ControlNet is a obvious counterexample. If you think "diffusion is just collaging", upload a control image using this space that cannot exist in the source dataset (e.g. a personal sketch) and generate your own image: https://huggingface.co/spaces/AP123/IllusionDiffusion
It’s not that I think it’s purely collage, but that inputting a high-quality image doesn’t somehow lead to generating better quality output by default. The various silly images created by using keywords like “Greek sculpture” or “Mona Lisa” are an example.
ControlNet is a obvious counterexample. If you think "diffusion is just collaging", upload a control image using this space that cannot exist in the source dataset (e.g. a personal sketch) and generate your own image: https://huggingface.co/spaces/AP123/IllusionDiffusion