Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sensational title that misrepresents the message in paper.

However, when conducting more targeted automatic evaluations, we found that the imitation models close little to none of the large gap between LLaMA and ChatGPT. In particular, we demonstrate that imitation models improve on evaluation tasks that are heavily supported in the imitation training data. On the other hand, the models do not improve (or even decline in accuracy) on evaluation datasets for which there is little support. For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.

Just because this might not be the way to replicate the performance of ChatGPT across all tasks, it seems to work quite well on whichever tasks are in the imitation learning. That is still a big win.

Later on this also works for factual correctness. (leaving aside the argument whether this is the right approach for factuality)

For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.



To be fair, this paper has been made obsolete in its entirety with recent research. It's not really their fault, but folks need to start publishing faster as posters or something if they want to provide something relevant.

A better title, knowing what we now, might be "To outperform GPT4, do more than imitating"


Link to said research?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: