According to Meta's benchmarking[0] it is comparable on many metrics. I haven't used it myself so I can't say for sure if that is the case when actually using it.
I don't understand this topic well, but given premise that GPT3 and ChatGPT are different only that ChatGPT includes RLHF(Reinforcement Learning from Human Feedback), and LLaMA 7b is comparable to GPT3 on a number of metrics, it would follow that if we were to improve LLaMA 7b with RLHF, the 7b model would be similar to ChatGPT. Is that correct?
You're likely right that applying RLHF (+ fine-tuning with instructions) to LLaMA 7b would produce results similar to ChatGPT, but I think you're implying that that would be feasible today.
RLHF requires a large amount of human feedback data and IIRC there's no open data set for that right now.
And they've already collected over 100,000 samples, iirc ChatGPT was trained on something like 30,000 samples, so the open models should already be positioned to succeed.
> You mean train it on ChatGPT's output? That's against OpenAI's terms of service.
Oh no, someone call the internet police.
I'm sure scraping tons and tons of images and web data to train DALLE and GPT and then selling access to that data to others was also against many licenses and terms of services, but OpenAI did those anyway.
None of these AIs were created ethically. At the very least we can make sure these huge models don’t solely belong to monopolistic tech companies and democratize their power.
GPT 3.5 likely differs from the original GPT 3 by more than instruction fine-tuning. For example, it was probably retrained under Chinchilla scaling laws [1], with a lot more data and maybe a somewhat smaller parameter count.
There are many variants of GPT-3 and GPT-3.5, and based on the performance numbers in Meta’s paper, it looks like they’re comparing against the very first version of GPT-3 from 2020. [2]