I have a hard time believing that the differentiating factor between the top mod...

I have a hard time believing that the differentiating factor between the top models is data volume. Sure, Google has all of the internet mirrored internally, but the vendor models are seeing unfathomably large collections. Clever engineering is doing more to getting better performance. I doubt DeepSeek had better English datasets than OpenAI or Google.