The size/computational complexity of usage also only grows as a polynomial with RSA. But even with this in mind, a quantum computer that can crack in polynomial time is still problematic. While you can increase the key size, the operator of the quantum computer could enlarge his computer, a true cat-and-mouse game. Unlike the current situation where usage complexity grows as a polynomial, while cracking grows exponentially. This difference is the reason why we can pick key sizes where signing/decryption takes a millisecond while cracking it takes (presumably) billions of years.
Encountering this by chance is exceedingly unlikely of course, if p and q are randomly generated. In probability terms it amounts to the first half of p (or q) all being zero (apart from a leading 1) so roughly 2^(-n/4) where n is the bit size of n. So for RSA 2048 the probability of this happening is on the order of 2^-512, or in base-10 terms 0.0000000...0000001, with roughly 150 zeros before the one!
I've seen many implementations that relied on iterating (or rather, they used a prime sieve but it amounts to the same thing in terms of the skewed distribution). While maybe not a problem in practice I always hated it - even for embedded systems etc. I always used pure rejection sampling, with a random one in each iteration.
As I understand it, the number of rounds needed (for a given acceptable failure probability) goes down the larger the number is. Note (very importantly) that this is assuming you are testing a RANDOM integer for primality. If you are given an integer from a potential malicious source you need to do the full number of rounds for the given level.
I still see a lot of RSA especially for encryption. While ECC can be used for encryption it is more convoluted and much less supported in hardware, software etc. than RSA decryption/encryption.
The performance of this can matter in some scenarios. In embedded systems, smart cards etc., generating the primes can take a significant amount of time (15-20 seconds typical) even with the 'low' number of iterations. Higher iterations means the user will have to wait even longer. Actually, in such systems it is not unusual to see occasional time-out errors and such when the smart card is unlucky with finding the primes. The timeout value may be a fixed, general value in a part of the stack difficult to change for the specific call.
Another case is short-term keys/certificates, where a fresh key pair is generated, and cert issues, for each and every signature. This setup makes revocation easier to handle (the certificate typically has a lifetime of a few minutes so it is 'revoked' almost immediately).
There are also scenarios where the keys are generated on a central system (HSM's etc.) to be injected into a production line or similar. There performance can matter as well. I worked on a system where the HSM's were used for this. They could typically generate an RSA key in 1-2 seconds, so 12 HSM's were needed to keep up with demand - and this was again with the reduced number of rounds.
Thanks for the reply (and it definitely answers my original question). For both short-lived keys, and where production time matters it makes perfect sense. I was mostly thinking about the scenario where you're generating a key for e.g. a long-lived certificate (maybe even a CA) or high-stakes PGP stuff. Just seems like you'd want spend a few more seconds in that case.