There is also a comparison against HotSpot, which is a hand-written GPU program (albeit imperfectly).
Futhark does not outperform expertly hand-optimised GPU code, but most of the GPU code found in the wild is hardly expertly hand-optimised. Futhark comes out on top surprisingly often, but can be solidly beaten for complex algorithms or clever implementations. See figure 8 in this paper for examples: https://futhark-lang.org/publications/ppopp19.pdf
That's fair, but it tells you nothing about the performance gap introduced by these high-level abstractions.