Maybe I’m not seeing it right, but comparing the source of Apple’s Whisper to Python Whisper seems there are minimal changes to redirect certain operations to using MLX.
There is also cpp Whisper (https://github.com/ggerganov/whisper.cpp) which seems to have it’s own kind of optimizations for Apple Silicon - I don’t think this was the one used with Nvidia during the test.
I don't think whisper was optimized for apple silicon. Doesn't it just use MLX? I mean if using an API for a platform counts as specifically optimized then the Nvidia version is "optimized" as well since it's probably using CUDA.
ETA: actually it's unclear from the article if the whisper optimizations were done by apple engineers, but it's definitely an optimized version.