GPT 3.5 likely differs from the original GPT 3 by more than instruction fine-tuning. For example, it was probably retrained under Chinchilla scaling laws [1], with a lot more data and maybe a somewhat smaller parameter count.
There are many variants of GPT-3 and GPT-3.5, and based on the performance numbers in Meta’s paper, it looks like they’re comparing against the very first version of GPT-3 from 2020. [2]
There are many variants of GPT-3 and GPT-3.5, and based on the performance numbers in Meta’s paper, it looks like they’re comparing against the very first version of GPT-3 from 2020. [2]
[1] https://arxiv.org/abs/2203.15556
[2] https://arxiv.org/abs/2005.14165