Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't the space you're talking about the input images that are close to the textual prompt?

These models are trained on image+text pairs. So if you prompt something like "an apple" you get a conceptual average of all images containing apples. Depending on your dataset, it's likely going to be a photograph of an apple in the center.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: