even the first version was described by them as gpt: "DALL·E is a 12-billion par...

even the first version was described by them as gpt: "DALL·E is a 12-billion parameter version of GPT-3", but i am not sure it's the GPT-3 that we know (as trained on all the stuff), but rather like, same/similar architecture? and in other places they mention CLIP as a part of the model (I'm talking about the first version)? all of this is quite confusing for me, and leaves me wondering, if this is simply "same architecture" or "reuse of embeddings" or "use in training" or "fine-tuning", especially for the exciting new version