GPT 3.5 likely differs from the original GPT 3 by more than instruction fine-tun...

GPT 3.5 likely differs from the original GPT 3 by more than instruction fine-tuning. For example, it was probably retrained under Chinchilla scaling laws [1], with a lot more data and maybe a somewhat smaller parameter count.

There are many variants of GPT-3 and GPT-3.5, and based on the performance numbers in Meta’s paper, it looks like they’re comparing against the very first version of GPT-3 from 2020. [2]

[1] https://arxiv.org/abs/2203.15556

[2] https://arxiv.org/abs/2005.14165