Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thoughts:

- ChatGPT integration is absolutely huge (ChatGPT Plus and enterprise integrations coming in October). This may severely thwart Midjourney and a whole bunch other text-to-image SaaS companies, leaving them only available to focus on NSFW use cases. - Quality looks comparable to Midjourney - but Midjourney has other useful features like upscaling, creating multiple variations, etc. Will DallE3 keep up, UX wise? - I absolutely prefer ChatGPT over Discord as the UI, so UI-wise I prefer this.



What I think could be amazing about ChatGPT integration is (holding my breath…) the ability to iterate and tweak images until they’re right, like you can with text with ChatGPT.

Currently with Midjourney/SD you sometimes get an amazing image, sometimes not. It feels like a casino. SD you can mask and try again but it’s fiddly and time consuming.

But if you could say ‘that image is great but I wanted there to be just one monkey, and can you make the sky green’ and have it take the original image modify it. Then that is a frikkin game changer and everyone else is dust.

This _probably_ isn’t the way it’s going to work. But I hope it is!


I am using Automatic1111 and SDXL and this is exactly the workflow people are doing with that.

Generate an image, maybe even a low quality one. Fix the seed and then start iterating on that


do you have a link on how this is done? what does "fix the seed" mean? I was experimenting last night with seed variation (under Extra checkbox next to the seed input box) and i couldn't accurately describe what the sliders did. Sometimes i'd get 8 images that were slight variations, and sometimes i'd get 3 of one style with small variations, and then 5 of a completely different composition with slight variations, in the same batch.

As far as the OP goes, they claim you don't need to prompt engineer anymore, but they just moved prompt engineering to chatgpt, with all of the fun caveats that comes with.


I think by "fixing the seed" they just meant using the "Reuse seed from last generation" button. By default, this will mean that all future images will get generated with the same seed, so they'll look identical. The "variation seed" thing mixes slight variations into this fixed seed, meaning that you're likely to get things that look similar, but not identical to the original. The "variation strength" slider controls how much impact from the variation seed is mixed into the original, with 0 being "everything from the initial seed", and 1 being "everything from the variation seed". I'm pretty sure that leaving the width/height sliders at their default positions is fine.

Also, just for future reference - lots of UI elements in A1111 have very descriptive tooltips, they help a lot when I can't quite remember the full effects of each setting.


Yes, that describes it pretty well.

By default the random seed is different for each generation, so even if the prompt and every setting is the same you'll get vastly different images ( which is the expected outcome ).

When you set the seed to a fixed number you get the same image every time you generate if all other settings and the prompt are exactly the same.

So I start off with a pretty high CFG Scale to give it enough room to imagine things ( Refiner and Resize turned off )

Once I have a good base image, I'll send it to img2img and iterate some more one it.


Thanks for info. I don't use img2img except when i want to "AI" a picture, but this makes sense, as one of the steps. I usually just "run off"* dozens of images and pick the best one - but i don't do this for a living or anything, just to be able to "find" a picture of something in less than a minute.

* like a mimeograph!


Not quite. In A1111, the process is "pick a seed and iterate the prompt". You're still regenerating the image from noise, and if SD decides to hallucinate a feature that wasn't in the prompt, it's often very hard to remove it.

Since gpt4 natively understands images, there's the potential for it to look at the image, and understand what about it you want to change


Exactly - iterative "chat-driven" improvement of images will be a paradigm shift


That looks exactly like what was in the video (though the video seems to be down?)


Didn’t see a video! OK I am pretty excited now.

So long as it actually can create an image virtually the same but changed only how I want it. That would just blow everything else away.

I mean I guess it’s. Or that ridiculous. Generative fill in photoshop is kind of this, but the ability to understand from a text prompt what I want to ‘mask’ - if that’s even how it would work - would be very clever


You can _kind_ of do this now, but it's excessively manual. I agreed that it would be awesome to have the AI figure out what you mean to iterate on a given image.

I recently had a somewhat frustrating experience of having a generated foreground thing and a generated background that were both independently awesome - and trying to wheedle Stable Diffusion into combining them the way I wanted was possible, but took a lot of manual labor, and the results were "just ok".


Dall-e 2 had variations, inpainting, etc well before Midjourney. However, I 100% agree that it's going to be interesting to see who and what wins this race.


Stability AI with Stable Diffusion is already at the finish line in this race, by being $0, open source and not being exclusively a cloud-based AI model and can be used offline.

Anything else that is 'open source' AI and allows on-device AI systems eventually brings the cost to $0.


I agree. I am barely excited for DALL-E 3 because I know it's going to be run by OpenAI who have repeatedly made me dislike them more and more over the last year plus. My thoughts are: "Cool. Another closed system like MidJourney. Chat integration would be cool but it's still going to likely be crazy expensive per image versus infinite possibilities with Stable Diffusion."

Especially with DALL-E. Honestly I'd be more excited if MidJourney released something new. DALL-E was the first but, in my experience, the lower-quality option. It felt like a toy, MidJourney felt like a top-tier product akin to Photoshop Express on mobile, still limited but amazing results every time, and Stable Diffusion feels like photoshop allowing endless possibilities locally without restrictions except it's FREE!


They all have their place. OpenAI literally started every major AI revolution including image gen with DALL-E. Let them be the peleton while SD and others follow closely and overtake eventually.


I like painting them as the peleton! You're not wrong it's just not super exciting for me


What's a peleton? Genuine question from a non native English speaker.


I think they mean peloton. It's not English, I believe it is French, meaning a group of bicyclers. In this context it refers to the peloton leading the race.+-


there has a to be a way to link the API from automatic1111 and a "gpt" or "bert" model, to allow similar flexibility, right? The only issue i see is training the llm on the rules of image composition correlated to what CLIP/Deepbooru sees. maybe there will be a leak, or someone can convince one of the "AI Art example/display" sites to give them a dump of images with all metadata. enough of that and this sort of thing seems like a "gimme".

I just started training LoRA, and the lack of definitive information is a burden; but following the commonalities in all of the guides i was able to get a LoRA to model a person i know, extremely accurately, about 1/8th of the time. in my experience, getting a "5 star" image 12% of the time is outstanding. And when i say 1/8th i mean i ran off 100 images, and hand ranked the ones that actually used the LoRA correctly, and 4 and 5 star ranks were 21/100 - i just double checked my numbers!


That's a different race entirely; most people don't even have the hardware, let alone the knowledge to run SD.


unless it's way worse


SDXL and older models are quite good. It's not yet very user friendly but will get there.


> ChatGPT integration is absolutely huge

Probably not. Bing Chat (which uses GPT-4 internally) already has integration of Bing Image Creator (which uses Dall-E ~2.5 internally), and it isn't good. It just writes image prompts for you, when you could simply write them yourself. It's a useless game of telephone.


To be fair, nsfw is a considerable market, which itself is plenty big enough for hold many many businesses. llm trained on literotica, anyone?


> llm trained on literotica, anyone?

I'm not going to post links, but there are several active projects & companies already doing that.


I don't see it. I use chatgpt to create prompts for midjourney. It takes me a couple of clicks only. I don't see the massive difference. Specially since midjourney is much much better than DallE




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: