Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Imagen 2 (cloud.google.com)
244 points by geox on Dec 13, 2023 | hide | past | favorite | 181 comments


This post has more information: https://cloud.google.com/blog/products/ai-machine-learning/i...

I can't figure out how to try this thing. The closest I got was this sentence:

"To get started with Imagen 2 on Vertex AI, find our documentation or reach out to your Google Cloud account representative to join the Trusted Tester Program."


I think the process is

1. Go to console.cloud.google.com

2. Go to model garden

3. Search imagegeneration

4. End up at https://console.cloud.google.com/vertex-ai/publishers/google...

And for whatever reason that is where the documentation is.

Sample request

    curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json; charset=utf-8" \
        -d @request.json \
        "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/imagegeneration@002:predict"
Sample request.json

    {
      "instances": [
        {
          "prompt": "TEXT_PROMPT"
        }
      ],
      "parameters": {
        "sampleCount": IMAGE_COUNT
      }
    }
Sample response

    {
      "predictions": [
        {
          "bytesBase64Encoded": "BASE64_IMG_BYTES",
          "mimeType": "image/png"
        },
        {
          "mimeType": "image/png",
          "bytesBase64Encoded": "BASE64_IMG_BYTES"
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID",
      "model": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID",
      "modelDisplayName": "MODEL_DISPLAYNAME",
      "modelVersionId": "1"
    }
Disclaimer: Haven't actually tried sending a request...


I can confirm that month ago there was a bug where you could try Imagen just via changing JS variables (but it didn't work for video generation).

Of course it became immediately obvious to me why the model isn't public. It's just not as good as advertised, that's why. Google should stop deceiving the public.


Once I finally got mostly set up for that, with billing and everything, it said it's only available for a limited number of customers, with a "request access" link to a google form with further links (to enable https://aiplatform.googleapis.com/) which 404.

What a shitshow.


Google seems to be desperately trying to show that they’re still relevant in AI, but they always end up with half-assed demos and presentations of products that don’t exist yet.


Isn't "half assed" a fairly accurate description of basically every google product since gmail and Android (and arguably that's been a rolling dumpster fire)

Even calendaring was something that took ages for them to get right. For something like a decade you couldn't move an event from one calendar to another on Android - only via the destop web view.

Google went from being an innovative company to a web version of IBM...a giant lumbering dinosaur that can't get out of its own way, and everyone kinda needs but also deeply loathes


How has Android been a dumpster fire? I don't even think it's arguable.


They may have had the occasional advantage over iOS for a brief period 5-6 years ago. Nowadays they clearly lag behind Apple on everything and even their Pixel line does a half-assed copy/paste of the latest iPhone features (or introduces features which are shiny but have some sort of downside or require cloud processing to work).

Android's advantage has always been that everyone else gets to play. And it's good that we have that. But they aren't exactly the beacon of innovation they think they are or claim to be in marketing copy.


Especially in regards to Android, feature parity is just not that high on my wishlist. It makes calls, apps run on it, happy days. It's like knowing my car has running gear parity with another brand, when most of us are just using them to drive to work.


This is your opinion with nothing from a technical standpoint.


Sure, that’s fair.


Weird, just tried in my terminal and it works fine. My account definitely has no special permissions, I've never requested any, I've probably spent less than $100 total on it (and that almost entirely on domain names).

Results: https://imgur.com/a/JIiuDt9


Self reply since I can't edit the post anymore. Tried this, the api seems to work just fine to me with no extra permissions.

Results (these are the only two images I generated): https://imgur.com/a/JIiuDt9


This is giving me ptsd flashbacks from working with gCloud from weird "console" pages hidden deep in some yggdrasil sized tree structure with undocumented apis and labyrinthian authentication processes unknown to everyone even Google themselves.


This page might be somewhat helpful: https://cloud.google.com/vertex-ai/docs/generative-ai/image/...

It also includes a link to the TTP form, although the form itself seems to make no reference to Imagen being part of the program anymore, confusingly. (Instead indicating that Imagen is GA.)


> GA.

Generally Available?


Yes


Google, save the marketing fluff, just let us play with the toys.


Yeah seriously this is a joke by now. Good research, but product wise they are like the slowest behemoth, impossible to contact, extremely convoluted in their communication and their interfaces like a kafkaesque maze.

Open AI really shows us how it's done, or the way Mistral just dumps a torrent on everyone. That's marketing i can respect.


The post actually says that it's only for approved users only.

>> generally available for Vertex AI customers on the allowlist (i.e., approved for access).


"To get started with Imagen 2 on Vertex AI, find our documentation or reach out to your Google Cloud account representative to join the Trusted Tester Program."

And also be prepared to wait somewhere between 6- inf months ... at this point the google cloud account reps can't even grease the wheels for us


So it is "Generally" Available.


Ok, we'll change to that from https://deepmind.google/technologies/imagen-2/ above. Thanks!


They've been emailing me saying I have access for quite some time as part of the Trusted Tester program., yet, I still do not. I can caption images but nothing else. So disappointed.


Google desperately needs to get their platform/docs in order. It is incredibly difficult to use any of their new AI stuff. I have access to Imagen (which was a rodeo to get on its own), but do not know if it v1 or v2 for example.


They need to ditch Sundar, I don't know what the hell they are thinking. Google so badly needs reorganization.


$


Why? The stock has consistently risen since Sundar was hired. So all is well! /s


This would have been an epic release two years ago, but there are now many well-established models in this area (DALL-E, Midjourney, Stable Diffusion). It would be great to see some comparisons or benchmarks to show Imagen 2 is a better alternative. As it stands, it's hard for me to tell if this is worth switching to.


> it's hard for me to tell

I can only compare it to Stable Diffusion. But Imagen2 seems significant more advanced.

Try to do anything with text and SDxl. It's not easy and often messes up. I don't think you can get a clean logo with multiple text areas on sdxl.

Look at the prompt and image of the robin. That is mighty impressive.


> I can only compare it to Stable Diffusion. But Imagen2 seems significant more advanced.

I wouldn't say this until we are able to try it for ourselves. As we know, Google is prone to severe cherry picking and deceptive marketing.


Google has this thing of releasing concept videos but communicating them as product demos.

Overselling is not a winning strategy, especially when others are shipping genuinely good products.

Every time Google show off something new the first thing people now ask is what part Google faked (or extreme cherry picking).


Stability AI has gaps in SDXL for text, but they seem to do a better job with Deep Floyd ( https://github.com/deep-floyd/IF ). I have done a lot of interesting text things with Deep Floyd


Looks good. But 24GB of vram is quite a lot for 1024x1024


This is a pixel diffusion model that doesn't use latent space encoding, hence the memory requirements. Besides, good prompt understanding requires large transformers for text encoding, usually far larger than the image generation part. DF IF is using T5.

You can use Harrlogos XL to produce text with SDXL, although it's mostly limited to short captions and logos. The other way (controlnets) is more involved. (and is actually useful)


yeah stable diffusion has very limited understanding of composition instructions. you can reliably get things drawn, but it's super hard to get a specific thing in a specific place (i.e "a man with blonde hairs near a girl with black hairs" is gonna assign hair color more or less randomly and there's no guarantee on how many people will be on the picture) - regional prompting and control net somewhat help, but regional prompting is very unreliable and control net is, well, not text to image.

dalle 3 gets things right most of the time


Right? This page looks like basically every other generative image AI announcement page as well as basically every model page. They show a bunch of their cherry-picked examples that are still only like "pretty good" (relative to the rest of the industry, it's incredible tech compared to something like deepdream) and give you nothing to really differentiate it.


I was going to pretty much state the same - the obvious, while also adding insult to injury by saying that with recent announces in the lats few weeks, it seems that Google desperately needs to shine in the world of AI, but fails to do so (despite 2000+ votes for new Bard, which is still not so good).

Now, from a designer perspective, honestly, I don't care too much who's the provider of the image, since one will have to anyway work more on it. So designers, illustrators, etc are not the target for such platforms, even though it seems counter-intuitive. If you ask me which system was the source for an image used for a poster last 12 months... well, I may remember, but is not of a paramount importance to the end result. After an year of active usage of DALLE2/3, SDXL, Midjourney (which is also SD of some sort) I can confidently state that there is much more work to do and a lot of prompting, to actually get something unique and something worth being used. Sadly the time taken is proportionate to working with actual real artist. Of course - the latter is likely to be hit by this new innovation, but perhaps not so much.

From the perspective of s.o. integrating text-ot-image - which is yet to be seen in a reasonable manner, like for a quest game with generative images - the API flexibility and cost would be the most important qualifier. Even then it may actually be better to run SD/XL. From cost perspective - all these services are still very pricey to be used for anything more serious than few one-shot images.


Kinda scratching my head at the purpose of the prompt understanding examples they show off. From previous papers I've seen in the space, shouldn't they be trying various compositional things like "A blue cube next to a red sphere" and variations thereof?

Instead they use

>The robin flew from his swinging spray of ivy on to the top of the wall and he opened his beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite as adorably lovely as a robin when he shows off - and they are nearly always doing it.

And show off the result being a photograph of a robin, cool. SDXL[0] can do the exact same thing given the same prompt, in fact even SD1.5 would be able to easily[1].

[0]https://i.imgur.com/rsgtYbf.png

[1]https://i.imgur.com/1rcQpcQ.png


I've developed two tests for AI image generators to see if they've actually advanced to "the next level". Take literally any AI image generator and give it one of these prompts:

"A flying squirrel gliding between trees": It won't be able to do it. Just telling it "flying squirrel" will often generate squirrels with bat wings coming off their backs.

Ahh, but that's just a tiny, specific thing missing from the data set! Surely that'll get fixed eventually as they add more training data...

"A fox girl hugging a bunny girl hugging a cat girl": The only way to make this work is with fancy stuff like Segment Anything (SAM) working with Stable Diffusion. Alternative prompts of the same thing:

"A fox girl and a bunny girl and a cat girl all hugging each other"

It's such a simple thing; generative AI can make three people hugging each other no problem. However, trying to get it to generate three different types of people in the same scene is really, really hard and largely dependent on luck.


I tested the prompts with dalle-3 (through the API)

The flying squirrel one, was spot on, it showed an image of the trees, and a squirrel with wings, which kind of looned like bat wings.

The 3 girls hugging each other however worked fairly well, it always created 3 different types of persons, but they never hugged each other. Either two of these 3 hugged each other, or no one hugged someone.


(Deleted)


In SD you can add words like twins, brothers, clones, repetition and copy to your negative prompt. It won't fix the problem, but it will help.

Would be a lot easier if AfterDetailer could handle dynamic prompts.


The prompt is a quote from a book, and it mentions "opened his beak and sang a loud, lovely trill", and the Imagen 2 robin does exactly that, but both SD ignored it completely, and SD1.5 isn't even on top of the wall.


I asked imagen 2 to generate a transparent product icon image, and it generated an actual grey and white square pattern as the background of the image... https://imgur.com/a/KA2yWHp


That's because it was trained on RGB images without an alpha channel. There is currently no public image generator that understands alpha channel.


As a user, this really frustrates me. Promoting is not precise enough to compose a bunch of specific elements, so the obvious solution is to do several prompts each with transparency and then combine in Photoshop/photopea. I end up asking for a white background and then cutting out manually


I feel like someone could satisfy this issue with a little background removal AI in the pipeline. I also go through the same process, stitching together a few tools, and obviously it's possible... but it sure would be nice if it all fit together better. Something where "transparent background" was translated to "white background" or something and then it went through the background removal.


The closest I've found is vector generative AI like what's in Adobe Illustrator today.


Like the other commenter said, these models aren't trained against images with an alpha channel. Given the same sized model that'd make typical results worse to benefit a niche case. You should be able to have them generate this style image on a background you can color key out though.


Those examples look nice and would be trivial to automatically cut out/trace into transparent vector with inkscape


Thankfully, MacOS and iOS have a fantastic ML powered "extract the image content in to a new image with transparent background" function that you could use on this silly output to get what you want.


Luckily there is another AI for removing the background (:


Wow, Google has really become the IBM of 2005s. All flashy demos, 'call sales' to try anything.


Idk they did release public access to Gemini on day one. At least for one of the versions.


This confirms that they don't really withhold access to other models because of "safety", but simply because those models are not as good as advertised.


I don't think this confirms that. They could just be better at managing their concerns around LLM safety before announcement.


According to Fiona Cicconi, Google’s chief people officer, Google employed 30,000 managers before the recent layoffs. The hard truth is Google needs a Twitter style culling. Take all those billions you're burning and give it to people with a builder mentality, not career sheeple. Unfortunately the same executives who would oversee this are the ones who need to be culled first.


From what I understand, Google has a unusually large number of engineers who are happy to coast, and would actively avoid taking on anything important. That seems more of an issue to me compared to middle management bloat.


I worked at Google from 2012 to 2022 and this didn't match my experience, for what it's worth. There were some people who coasted, but it was not common. There were a lot of people who got much less done than you might expect due to bureaucratic friction, but my coworkers were generally very enthusiastic to take on important things.


Important things and delivery of features are two very different things.

Rewriting a Linux kernel module is "important", but rarely impactful.


Right, but the engineering output exists... in the google graveyard.


There are plenty of ICs who coast there, but what's far worse are the groups of ICs who are all pushing hard in different directions because their leadership isn't taking charge. IDK if middle management bloat exactly is the problem either, but there's some kind of ineffectiveness, maybe even at the top.

One low-level issue is how long everything has to take because of tooling. Engineers have way too much patience for overcomplicated garbage and tend to obsess over pointless details. Kind of in the opposite direction of coasting, but still a real problem.


Actively meandering in the wrong direction.


Actively deprecating random stuff for no reason


I've seen this claim thrown around a few times but haven't really seen any evidence that it's true, beyond a few unconvincing anecdotes.


How come you're readily willing to accept that managers will coast, but not that engineers will coast?


Personally i've found engineering / designer types, including myself, often a bit on various adhd/autism like spectrums with tendencies to overwork, hyperfocus and in general "attach themselves very much to some domain", not that this is always a good thing.

I've met many from the managerial class without these traits that seem to have no problem coasting and trancending actual meticulous work because their game is all about personal career management, not the hyperfocus a lot of us here engage in daily.


Really? how many engineers do you know who work at Google? Do they say they are working hard or coasting? A big selling point of working at Google is that it's a known place you can coast and get a big paycheck.


What kind of evidence would this involve?

Would you have agreed this was the case at Twitter for a while?


That wasn't even true at Twitter and it's really trivial to verify that even now.

Stop attacking other people and mind your own business, especially if you're making stuff up.


Twitter is barely working. They had to cut browsing for not logged-in people.


It's true, was there through 2016-2023.

People just have different definitions of what coasting means. In general don't think "doing nothing" or "avoiding work" think "add certainty to process + decision making like everyone else does", and much more importantly "avoiding friction because as soon as there's even a little bit, people leverage it"

More detail on what causes this:

- processes become elongated through what Steve Yegge called cookie-licking, more specifically, anyone above line level doing "I am the 10th person who needs to give a green light for this to happen"

- the elongated process taking so long with that many people that some people lose interest or move on or forget they already approved it

- business disruptions (ex. now Sundar told VP told VP told VP who told director to add GenAI goals)

- bad managers are __really__ bad at BigCo, there's so much insulation from reality due to the money printer, and cultural bias towards "meh everythings good!"

- managers trying to get stuff done rely on people who slavishly overwork to do the minimum possible for their _direct manager_ to be happy

- only needing to keep your manager happy, and your manager being focused on deploying limited resources, creates a suspicious untrusting atmosphere. The amount of othering and trash-talking is incredibly disturbing.

- _someone_ has to slavishly overwork on any given project because there's very little planning. due to the "meh everythings good!" inclination, coupled to software being pretty hard to plan accurately anyway. so what's the point of planning it all?

- newly minted middle managers are used to clinging onto anything their manager cares about and overworking, so they end up being a massive bottleneck for their reports. New middle manager on my team's profile page looked like a military dictator's medals, 6 projects they were "leading", 1 of which they were actually working on and actually got done.

- The "coaster" realizes "if I go outside the remit of what my manager asked for, they A) won't care because they didn't ask for it B) which exposes me to non-zero friction because they'll constantly be wondering why I'm doing it at all C) I'll have to overwork because they won't help plan or distribute work because it was my idea to go beyond the bare minimum D) its very very hard to get promoted, especially based on work my manager didn't explicitly ask for E) the cultural bias here is strongly towards everything is okay all the time no matter what, so any visible friction will be attributed to me personally being difficult

And that's _before_ you account for the genuine sociopathy you see increasingly as you move up the ladder.

Anecdote:

I waited _3 years_ to launch work I had done and 3 VPs asked for. Year 3, it came to a head b/c one of the 3 was like "wtf is going on!?" My team's product manager outright pretended our org's VP didn't want it, had 0 interest in it, after first pretending it didn't _come up at all_ in a meeting arranged to talk about it.

Within a couple weeks this was corrected by yet another VP meeting where they called in the PM's boss' boss' boss and the VP was like "fuck yeah I want this yesterday", but engineering middle manager and PM closed ranks to blame it on me. Engineering went with "Where's the plan / doc!?!?" (I won't even try to explain this, trust me, after 3 yrs they knew and there were docs), and both pretended I was interrupting meetings regularly (I was the only one who ever wrote anything on the agenda, and once we hit year 2.5, I was very careful to only speak when called upon because it was clear it was going to build up to this, as they were assigned the new shiny year-long project to rush a half-assed version of Cupertino's latest, as they were every year).


For anyone reading, if you care about your work, dysfunctional org situations like that will kill you with stress. Either fix the situation or get away, sooner rather than later. Almost nothing is worth that.


^ this. 100x.

For camaraderie along the way:

- any peer to peer counseling / mentorship at your company. having someone senior in an unrelated division caring about it, and who I also could trust to be honest with me about when it was my fault vs. I was being railroaded helped a ton

- Blind (the app). Standard perils of internet anonymity and verbal brutality, but, at least you'll always get excellent advice. if you did your best, people aren't afraid to say it either.

- be aware of your companies policies on medical leave.

- leave sooner rather than later


I work at a big corp and my experience is swaths of the year (I'm talking weeks at a time) when upper management is engaging in faction warfare many levels above me, and my own manager doesn't know what the team should be working on until upper management can set priorities. So I have nothing to work on. I guess this is 'coasting'.


I remember a situation during my stint at Google, VP involved in a not-really-competing internal system kept threatening us about comparing our system against theirs so we don't make them look bad. Even though there was an explicit internal benchmark (made by them!) that was made for such comparisons!

Our team was flabbergasted that this could even be an issue.


That's an excellent example. TL;DR: Working through why people did that ultimately lead to my epiphany that no, it wasn't that everyone else was lazy, they were just better-adjusted via understanding tradeoffs.

Longer version. Sorry to torture the threads with these, but I've noticed people don't take 'BigCo is a weird, strange, place' stories seriously unless there's a full anecdote coupled to it:

Google was my first real job, got very very lucky with a transition from dropout waiter => startup founder => sold => 9 months later, did interviews as a joke and...passed?

My first few years, I didn't understand this was happening, and eventually we got transferred to Android, and it was just an absolute directionless wasteland for at least 4 months. I couldn't even begin to understand why my peers A) had no work B) were fine with it C) when we tried talking about this, it was like we were speaking different languages.

I saw it as a 'leadership opportunity' and butted my/our way in to a big project and picked up another. Huge stuff. Visual redo of key property, and on the side, got a fundamental change to the input method for the same property, delivered by me client side and server side, then wheeled and dealed to get it deployed cross-platform.

That whole year peers didn't invest in the visual redo, even though it was ostensibly our teams work. Our newly promoted manager never planned / assigned work to people, and was out for about 50% of that first year.

It turned into Lord of the Flies while they were out. Only 2 peers worked on it out of 4. #3 helped out on a lower-key project. #4 focused on advocating for a feature that'd watch your screen and ex. tell you Infowars was Very Bad if you visited Infowars. At Old Google you could work on obviously bad ideas like this and you just wouldn't advance. It's a good thing that this would only last a month or two these days, if it happened at all.

Peer A was extremely confident but also extremely out of touch, for example, 2 weeks before launch they spent 5 minutes arguing with the partner team, telling the it was impossible that we had written all our code in $BINARY_A instead of $BINARY_B...which we had. When faced with the bare fact, they then went with "oh no wonder why nothing works" (???)

Peer B was relatively new to tech, so the histrionics the other would leap to had a massive influence on them. Always horrified we were doing anything at all without getting 3 separate approvals first, stapled to a direct request laying out exactly what was required, instead of just a Figma / GIF.

Peer B also got _insanely_ over-the-top mean to me after the project. Yet, they were nice and extremely intelligent generally.

That's when it finally clicked for me that something was off and I needed to approach the coasting question more inquisitively:

_what_ were they seeing differently?

They understood they were avoiding pain that they'd get ~0 credit for working through.

They were right.

I got excellent reviews from the partner team and product manager, I got awful reviews from Peer A and a meh one from peer B, and got a middling performance review after moving 2 mountains essentially solo.

Though, a $10K bonus, this was standard payout for staying silent / not complaining after dealing with an obviously toxic situation.

I had to appeal to VPs for recommendations the next year to break through the "gee you moved two mountains and had great feedback from everyone _not_ on the team, but peer A and peer B didn't like you much"


Middle managers' job is to get the best out of engineers. If your direct manager does not set up an ambitious team with ambitious goals, what are you supposed to do?

Ambition trickles downwards and is killed upwards.


Both may be true. Culture isn't really necessarily that siloed between engineering and management.


> Google employed 30,000 managers before the recent layoffs.

I'm guessing that number included product/program managers, not just "people managers".


That’s still pretty insane.


or ask most of the people managers to become ICs and start actually doing something technical


How did the Twitter-style culling work out for Twitter?


AFAIK it worked out well. Works more-less the same as before, shipped quite a bit of stuff and drastically reduced costs.


Shipped stuff? Like what?

Threads? Its usage is down 90% since its launch six months ago, presumably because they kept the people who could launch stuff and got rid of the people who had some idea of what should be launched.

The "Blue Checkmark" system? Released with no thought at all, absolute disaster. Steven King had to publicly announce that, despite indications to the contrary, he was not a paid user, and he felt it was important to tell people because he didn't want the idea that he was a paid subscriber to harm his reputation. Same underlying problem: the people who could ship things were still shipping things, but the people who could figure out what to make were gone.

And yes, they did drastically reduce cost...and much more drastically reduce revenue.


The issues with Twitter are currently about Musk buying it at ridiculous price and his personal antics. Other than that, it works fine as always.

They shipped quite a bit of stuff, like the blue tick or revenue sharing. Other than Musk courting fascist and other kind of undesirables, twitter as a product is doing fine. It might go under though, but if that happens isn't going to happen because lack of employees.


It's absolutely full of spam, and the only ads left are crypto airdrops coming from what look like hacked accounts.


Have you used twitter before the musk takeover? That's exactly what you are describing. At worst, nothing changed. At best, in my experience, the spam problem has somehow gotten better. Much less spam accounts below every single post like how it used to be.

I still think layoffs are bad because I don't care about corporate profit or efficiency to be honest, but in this case it's a bit surprising how nothing concrete has actually changed even with 80% reduction in staffing. 80% sounds apocalyptical to me but again, twitter just works like it did before. With the same annoying, never fixed bugs (occasional "something went wrong" on clicking tweets, etc) . But again, nothing close to the (technical) train wreck I would expect.


> Have you used twitter before the musk takeover?

Yes.

> Much less spam accounts below every single post like how it used to be.

No, they're still there. They're even more there on popular posts.

A strange thing is that they never seem to ban the onlyfans bots, but they do hide them under "more replies" - so if you habitually expand that, you just keep seeing the same ones everywhere.

> but in this case it's a bit surprising how nothing concrete has actually changed even with 80% reduction in staffing

That's not too surprising, because what the other people were doing was changing stuff. So now they're gone, things won't change, ever.


> more-less the same as before

Not if you have no account and are not in US. Before, when I clicked on twitter link it worked 99.9% of the time. Now it is lottery. Sometimes it loads without comments, most of the time it does not load at all.


Even with an account (though I'm also not in the US, but in the UK) I've recently (many months) found the Twitter site to be about as unreliable as the old, early fail whale days (except instead of a cute whale, I just get the page refreshing back to the top of my timeline, or a tweet loading then replies failing to load).

I used to hardly ever see spam, except when looking at replies to famous huge accounts, now I get 2-5 follows/likes/mentions a day from fake accounts mostly of semi-naked girls with a link to a website.

And any reasonably active thread of replies to a tweet now surfaces the idiotic nonsense of blue tick subscribers to the top, rather than ranking by tweet quality/relevance.


Google became so much of an ad company that they now confuse advertisement with actual product launches.


For the first Imagen (and for Parti) they released detailed papers. Now they do not even release benchmark results. A shame.


They never released Imagen 1 either, why do they even do these "releases"?


the post says its generally available and includes instructions on how to use it via their API


The documentation [1] says otherwise. Image generation is "Restricted General Availability (approved users)" and "To request access to use this Imagen feature, contact your Google account representative."

[1] https://cloud.google.com/vertex-ai/docs/generative-ai/image/...


they should make it accessible at https://imagen.google like how meta did with https://imagine.meta.com


Don't forget Bing Image Creator: https://www.bing.com/images/create

My kids found it organically and were happily creating all sorts of DALL·E 3 images.


Strange dark pattern. Both have prompts without submit button.


Meta's one is a really good try. I've used it recently for a lot of stuff. It has way less censorship than DALLE3 via GPT Pro. I did eventually get banned for trying to make too many funny horror pics though.


The authors of the original Imagen paper have gone on to create https://ideogram.ai/


Without a paper about the architecture or the training setup, these announcements are particularly boring.

I was hoping to see some research development but nothing.


A potentially good summary: "We tried to clone Stable Diffusion except we used more GPUs in the process. However the dataset is so heavily censored that the results are disappointing."


I don't really care about these product images. The real test is whether it can produce pictures of hands with five fingers.


Allegedly Imagen 2 is indeed better at producing hands: https://deepmind.google/technologies/imagen-2/

> Imagen 2’s dataset and model advances have delivered improvements in many of the areas that text-to-image tools often struggle with, including rendering realistic hands and human faces and keeping images free of distracting visual artifacts.


There's no actual way to use this.


Google needs to update their playbook. A publicly available demo goes a long way. Google of all companies shouldn't be coming off as the dinosaur when it comes to generative AI.


I don't see any examples of the things existing models really struggle with, like text or counting things.


there's literally a full section on that page called "Text rendering support" (with examples)


The link was changed since I posted my comment, and it's been two hours so I can no longer edit or delete my no-longer-relevant comment. Glad to see the text examples in this link.


So far the only new thing I'm seeing is the ability to handle text. Didn't Nvidia announce a model that could do that months ago?


I've tried and it's genuinely bad, with obvious artifacts. I'm surprised it got released


Can you throw some examples up on Imgur or something?


Sure, here's sample query https://imgur.com/a/8UDDac9 These are DALL-E 3, Imagen, and Imagen 2, in this order. I've used the code based on similar examples from GitHub [1]. According to docs [2], imagegeneration@005 was released on the 11th, so I guessed it's Imagen 2, though there are no confirmations.

[1] https://github.com/GoogleCloudPlatform/generative-ai/blob/ma...

[2] https://console.cloud.google.com/vertex-ai/publishers/google...


You've got to be kidding me. This almost deserves its own post.

That last picture is still so horribly bad it's no wonder Google made it almost impossible to access this tech.

How did Google drop the ball on AI like this when they pioneered the entire field?


To be fair, it is not always that awful. Here is a sample of the results of simpler subset prompts I like to run on image generation: https://imgur.com/a/aO5S7yM Some are bad (first two), but others are okay; it understands text pretty well, but the artifacts just feel like years ago.

I still can't understand how it got released and advertized.


Thanks for the examples.

I've used SD / Midjourney / Dalle extensively and would say this is honestly shockingly bad besides the two last ones.

Comparable to the first versions of the other services, but with better contextual understanding but still with lots of gnarly artifacts and weirdness going on.


But how do we use it?

Yet another documentation release by googling, promising impressive things that we cannot actually use, while the competition is readily available.


I still cannot believe they missed one of the most critical parts of this release - clear and simple instructions on how to use it. How do they even hope to get adoption without that is unclear to me.


It says we can use it with their API. Would be good to have a link to it though.


is there an arxiv paper on how they went from 1 to 2? or any other details?


The prompt "A shot of a 32-year-old female, up and coming conservationist in a jungle; athletic with short, curly hair and a warm smile" produced an impressive image. But I ran the same prompt 3 times on my laptop in just a few minutes, and got 3 almost-equally impressive images. (using stable diffusion and a free model called devlishphotorealism_sdxl15)

https://imgur.com/a/4otrN17


How are two completely different models from different groups, converging on what looks like the exact same person? Number 1 and 3 are eerily similar. I don't understand.



That's an incredibly interesting observation. Thanks for sharing.


It's because the only thing these models can do is rip off existing images, and the prompt is very specific.

"Generative AI" is a learned, lossy compression codec. You should not be surprised that the range of outputs for a given input seems limited.


That makes sense - but in Google's case, I'd expect them to have access to private datasets that would give it something different than public models like SD.



I think you might be misunderstanding. The GP did three runs using one model, each with the same prompt that was used for the Imagen demo image. The outputs are images 1, 3 and 4. Hence the similarity.


Because imgur scrambled the order during upload. /facepalm


Because the central limit theorem applies to web-trained image models.


I really don't understand how they came up with the _exact_ same image. This goes against my previous understanding of how these technologies work, and would appear to lend credence to the "they just regurgitate training material" argument.


Pretty sure they didn't come up with the same image. Images 1, 3, and 4 are the three images the GP generated and they put the Imagen-generated image (2) into the set for ease of comparison.


Ok yes if that is the case then it makes much more sense.


While they are similar in quality, your images have much more of the saturated and high contrast nature of AI generated images, and this is very noticeable to my eye.


I agree, yours are practically identical in quality.


For the peer comments

- https://cloud.google.com/vertex-ai (marketing page)

- https://cloud.google.com/vertex-ai/docs (docs entry point)

- https://console.cloud.google.com/vertex-ai (cloud console)

- https://console.cloud.google.com/vertex-ai/model-garden (all the models)

- https://console.cloud.google.com/vertex-ai/generative (studio / playground)

VertexAI is the umbrella for all of the Google models available through their cloud platform.

It still seems there is confusion (at google) about this being TTP or GA. Docs say both, the studio has a request access link.

more... this page has a table with features and current access levels: https://cloud.google.com/vertex-ai/docs/generative-ai/image/...

Seems that some features are GA while others are still in early access, in particular image generation is still EA, or what they call "Restricted GA"


Why do Google and Amazon overload their data science notebook offerings with a lot of half-baked poorly documented models and features?

Is this just an end-run around incompetent security teams or something?


I'm not sure what you mean. VertexAI is a product in the larger Google Cloud portfolio. It makes sense that they house everything together instead of making disparate platforms for each. This makes authnz consistent for me and simplifies their end too.

In addition to the models, you'll find a host of day-2 features like model monitoring and experiment tracking. Having to vet and pick from 100+ new SaaS's for these is a nice to not have problem.


I think the competition for text to image services is over and open source, stable diffusion won. It doesn't matter how detailed (or whatever counts as "better") corporate text-to-image products get, stable diffusion is good enough which really is good enough. Unlike the corporate offerings, open source txt2img doesn't have random restrictions (no its not just porn at this point) and actually allows for additional scripts/tooling/models. If you're attempting to do anything on a professional level or produce an image with specific details via txt2img, you likely have a workflow with txt2img being only step one.

Why bother using a product from a company that is notorious for failing to commit to most of their services, when you can run something which produces output that is pretty close (and maybe better) and is free to run and change and train?


I also think it's over, but I don't see how Stable Diffusion won anything. If something, I see people flocking en masse to dalle3/google/amazon/whatever API is easy to integrate in one side, and consumers paying for Adobe & Canva in the other.

Stable Diffusion is the Linux-on-the-desktop of diffusion models IMO

(I agree w/ your comment on trusting Google - pretty sure they'll just phase this off eventually anyway, so I wouldn't bother trying it)


I would totally agree. I’ve tried to setup stable diffusion a couple times, and even as a professional software engineer working in AI, every time I fail to get good results, get interrupted, lose track, and end up back at DALLE. I’ve seen what it can do, I know it can be amazing, but like Linux it has some serious usability issues


Using this: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Then this: https://civitai.com/

And I have completely abandoned DALLE and will likely never use it again.


I was kind of hoping someone like you would reply - you’re a very kind person. Thank you for taking the time. Excited to try this advice tonight!


On Windows just use https://softology.pro/tutorials/tensorflow/tensorflow.htm

It installs dozens upon dozens of models and related scripts painlessly.


I don't think there's numbers that show "people flocking" to paid vs free open source offerings since running your own stable diffusion server/desktop isn't showing up on a sale's report.

Linux entered the market at a time when paid alternatives were fully established and concentrated, servicing users/companies for years who became used to working with them. No paid txt2img offering comes anywhere close to market dominance for image generation. They don't offer anything that isn't available with free alternatives (they actually offer less) and are highly restrictive in comparison. Anyone who is doing anything beyond disguised DALLE/Imagen clients, has absolutely no incentives to use a paid service.


Why stable diffusion won? Dalle3 and this is miles ahead in understanding scene and put correct text at the right place.

This makes the image much more usable without editing.


DALL-E 3 doesn't have Stable Diffusion's killer feature, which is the ability to use an image as input and influence that image with the prompt.

(DALL-E pretends to do that, but it's actually just using GPT-4 Vision to create a description of the image and then prompting based on that.)

Live editing tools like https://drawfast.tldraw.com/ are increasingly being built on top of Stable Diffusion, and are far and away the most interesting way to interact with image generation models. You can't build that on DALL-E 3.


Saying SD is losing or not useful isn't my position.

But it clearly didn't win in many scenarios, especially those require text to be precise, and that happens to be more important in commercial setting, to clear up those gibberish texts generated by OSS stable diffusion seems tiring by itself.


If you’re in charge of graphics in a “commercial setting”, you 100% couldn’t care less about text and likely do not want txt2img to include text at all. #1 it’s about the easiest thing to deal with in Photoshop, #2 you likely want to have complete control over text placement/fonts etc., #3 you actually have to have licenses for fonts, especially for commercial purposes. Using a random font from a txt2img generator can open you up to IP litigation.


I think because most people are used to Dall-E and the Midjourney user experience, they don't know what they're missing. In my experience SD was just as good in terms of "understanding" but offers way more features when using something like AUTOMATIC 1111.

If you're just generating something for fun then DallE/MJ is probably sufficient, but if you're doing a project that requires specific details/style/consistency you're going to need way more tools. With SD/A*1111 you can use a specific model (one that generates images with an Anime style for instance), use a ControlNet model for a specific pose, generate hundreds of potential images (without having to pay for each), use other tools like img2img/inpaint to hone your vision using the images you like, and if you're looking for a specific effect (like a gif for instance), you can use the many extensions created by the community to make it happen.


> Dalle3 and this is miles ahead in understanding scene and put correct text at the right place.

I guess that turns out to be not as important for end users as you'd think.

Anyway, DeepFloyd/IF has great comprehension. It is straightforward to improve that for Stable Diffusion, I cannot tell you exactly why they haven't tried this.


Deepfloyd is slower and needs a lot more memory since it's pixel diffusion.

Also not sure if it can be extended with LORAs or by turning it into a video/3D model the same way an LDM can.


> Why bother using a product from a company that is notorious for failing to commit to most of their services, when you can run something which produces output that is pretty close (and maybe better) and is free to run and change and train?

Because it costs $0.02 per image instead of $1000 on a graphics card and endless buggering around to set up.


You don't even need a GPU anymore unless you care about realtime. A decent CPU can generate a 512x512 image in 2 seconds.

https://github.com/rupeshs/fastsdcpu

https://www.youtube.com/watch?v=s2zSxBHkNE0


$0.02 per image is crazy expensive! Running a higher tier GPU on Runpod is a fraction of the cost (especially if you're pricing per image).

*it also takes like 15 mins to setup up (this includes loading the models).


you can use stable diffusion on many hosted services out there (eg Replicate) for fractions of a cent. 2 cents per image is absurdly expensive, they're anchoring that on the dalle3 price, which likely won't go down because there's little incentive to do so, specially from their stakeholders/partners (shutterstock, etc)


if you're interested in exploring providers shadeform will let you compare prices for the same cards across providers*

i'm one of the founders


Stable Diffusion with the right fine-tunes in the hand of a competent user might be the best (if you define "realistic" as best, MidJourney might disagree with that being the only metric). It is good enough that I find it hard to get excited about somebody showing off a new model.

Still, Stable Diffusion is losing the usability, tooling and integration game. The people who care to make interfaces for it mostly treat it as an expert tool, not something for people who have never heard of image generating AI. Many competing services have better out-of-the-box results (for people who don't know what a negative prompt is), easier hosting, user friendly integrations in tools that matter, better hosted services, etc.


Google has as good a track record as anyone else for not shutting down Cloud services. Consumer services are a different category of product.


SD still can't do interactions (between people, objects) as well as DALL-E 3 can. I hope that improves. And unfortunately this isn't like software where we can just slowly build a better open source version. This costs millions to train. I hope that as the hardware and algorithms improve (and perhaps the datasets as well) it won't be that way in the future. Random kick starters can get hundreds of thousands easily and I think we could see something like that with with something like SD as well in the future.


I don't think SD has won the fight. It still doesn't give creators a full control of the output. It might be useful to auto generate some random illustrations but you need to give more controls if the output needs to be used as essential assets.


SD can’t give indemnification the way Google and Microsoft can.


I love being Canadian - "Not Available in Canada Due To Regulatory Uncertainty"


Same here. Nothing AI related from Google is available in Canada. This sucks.

To add insult to injury, they have nice press releases and demos of their latest AI but aren't easily accessible or available until next year. The press and Wallstreet love gob it up and the stock rises. Is it just for them?


Anthropic doesn't allow Canadians access to their AI services either. I haven't had the chance to check out if I can get access to Claude via Amazon Bedrock - but that might be an option. My company is already on AWS and currently they are thinking of dipping their toes into using AI for our software next year, so I might get to play around with it yet. It'll probably either be OpenAI integration directly, or going with something that's available as a hosted service on AWS.

OpenAI services are available in Canada but as an individual, $27/mo for ChatGPT Plus and then paying per use for the API is kinda a hard sell for me.

I'm needing a hardware refresh soon, so I think i'm just going to run the open source stuff locally once I get around to figuring out how to set that all up.


Translation is that the government engaged in a shakedown of Google, on behalf of Bell and Rogers: Bill c-18. It was disgusting and corrupt and I’m glad that Google and Facebook pushed back.

This has recently been resolved though, with a compromised deal, so hopefully these services will soon be available here


Unfortunately Google caved and are giving away 100 million dollars. I wish they had a spine like Meta. I generally despise both companies (Meta more than Google) but the enemy of my enemy can be my friend at arm's length.


The amount and terms agreed upon are what Google originally offerred, so I've mostly seen it reported as the gov't caving.

But I still agree with you - would rather have seen Google not give in to this sort of thing at all.

It was very different for Meta - they already don't like sending people away from their site so it was much easier for them to hold out.



I hate that you need a Google account to use it. I generally don’t mind creating yet another account on the internet since one can easily create such accounts with temp. email addresses, for example; but with Google is trickier (sometimes they even ask a mobile phone number and all when signing up), and I prefer not to have a dummy google account which I use alongside my real google account for fear of being locked out (e.g., google may think “this guy has two accounts, same computer same ip… let’s ban him”)


> (e.g., google may think “this guy has two accounts, same computer same ip… let’s ban him”)

FWIW I think I have 5+ google accounts. Have had them since gmail was in beta and have never been banned


Considering Google was caught faking stuff during the recent Gemini introduction, I'll take this with a big grain of salt, doubly so considering they don't have a way for people to try it out.


Name a corporation that hasn't embellished their corporate tech demos.


OpenAI


From their last product release:

> As always, you are in control of your data with ChatGPT.

Which is a flat-out lie. You can allegedly opt-out of them using your data for training, but you are still sending your data to a private corporation for processing/etc. which makes it totally unsuitable for handling sensitive or restricted data.


Fair enough.


To all the people saying “this sucks because we can’t use it” — there’s no real value in Google releasing this vs just making the announcement. This space is a race to the bottom, and there’s no significant profit being created in image gen right now (even if the service generates cashflow, the training and inference cost is insane). For the sake of team morale and legal risk, this announcement is totally enough, better to keep training models and focus on the next announcement…


We can use it. It's generally available. We just can't find the page that explains how to use it or lets us test it.


Only for trusted testers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: