Hacker Newsnew | past | comments | ask | show | jobs | submit | vlovic's commentslogin

The answer is Bell's Theorem. Scott Aaronson has some good explanations.


i won't have time to read this until later, but here's a post by him on it: https://scottaaronson.blog/?cat=33


Do I understand this correctly: they just took Blip2 and replaced the LLM with Vicuna, and to do that they just added a single linear layer to translate between frozen vision encoder and (frozen) Vicuna? Additionally, and importantly, they manually create a high quality dataset for finetuning their model.

If that is the case, then this is really a very, very simple paper. But I guess simple things can lead to great improvements, and indeed their results seem very impressive. Goes to show how much low hanging fruit there must be in deep learning these days by leveraging the amazing, and amazingly general, capabilities of LLMs.


Yes, model composability magic.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: