I think you're right about the current limitations, but imagine a trillion or te...

I think you're right about the current limitations, but imagine a trillion or ten trillion parameter model trained and RLHF'd for this specific use case. It may take a year or two, but I see no reason to think it isn't coming.

Yes, hardware requirements will be steep, but it will still be cheap compared to equivalent human illustrators. And compute costs will go down in the long run.