driving the display is just a part of it - they need to do super accurate SLAM, object recognition/machine learning inference, and if the visuals are interactive and not pre-rendered then all the lighting, texturing, rigging, physics etc of the rendered models - all this on the same cpu/GPU.
It's just about doable on desktop VR setups with latest desktop GPUs
With their budget they can afford to make a custom SLAM ASIC, push more cores into their SoC and create some special GPU. Such capital-heavy ASIC approach can give 10-100x performance boost.
Still, I agree that sounds infeasible.