Dalle-E seems to take & apply more complex text prompts w/ scene compositions be...

Agentlien · on Sept 29, 2022

I have spent a lot of time banging my head against more complex prompts in SD. Trying to get it to understand separation of objects and composition is my biggest issue with this otherwise awesome tool.

It took me a lot of tries and about half an hour to generate a blue lemon on a (blue) marble countertop instead of a yellow lemon on a blue marble countertop.

I spent longer than I'd like to admit trying to get an image of a human running from a horde of zombies. No matter what I did I always got a zombie leading a charging horde of zombies.

I've completely given up on trying to describe layout of a complex scene, but that can be solved with img2img.

It is sometimes very difficult to get characters to interact with objects or each other in natural ways. And whether it works or not can be unpredictable. You can consistently get very common interactions like people dancing or riding a bicycle right. But good luck trying to generate say, a photograph of a man poking a sheep in the ear. Ok, I tried that one[0] and after a few minutes and about 16 generated images I actually got a few that were sort of accurate.

[0]: https://imgur.com/a/11fc2wj