Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Dalle-E seems to take & apply more complex text prompts w/ scene compositions better than Stable Diffusion:

With SD if I write "$X walking on the moon with $Y in a bold majestic style" the results will ofteb only include $X OR $Y. Dall-E appears to do a better job of identifying multiple subjects in this way.

But it's hard to beat free: SD cant quite let me run it locally on my laptops's 2GB Ram GPU but I doubt it will take much longer for flexible distros of it to allow click-to-install versions that will run on most setups that have something resembling a discrete GPU.



I have spent a lot of time banging my head against more complex prompts in SD. Trying to get it to understand separation of objects and composition is my biggest issue with this otherwise awesome tool.

It took me a lot of tries and about half an hour to generate a blue lemon on a (blue) marble countertop instead of a yellow lemon on a blue marble countertop.

I spent longer than I'd like to admit trying to get an image of a human running from a horde of zombies. No matter what I did I always got a zombie leading a charging horde of zombies.

I've completely given up on trying to describe layout of a complex scene, but that can be solved with img2img.

It is sometimes very difficult to get characters to interact with objects or each other in natural ways. And whether it works or not can be unpredictable. You can consistently get very common interactions like people dancing or riding a bicycle right. But good luck trying to generate say, a photograph of a man poking a sheep in the ear. Ok, I tried that one[0] and after a few minutes and about 16 generated images I actually got a few that were sort of accurate.

[0]: https://imgur.com/a/11fc2wj




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: