Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This is the beautiful part - a mere multiplication is enough to convert the image tensor to text tensor. One freaking line of code, and a simple one.

I thought they were creating image tokens based on the queries during finetuning and appending them to the language model. They are not text tokens.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: