Did you read the sentence right after he makes that claim?
> At Cloudinary, we recently performed a large-scale subjective image quality assessment experiment, involving over 40,000 test subjects and 1.4 million scores.
This isn't about your, my or Jon Sneyer's individual opinions on these photos, it's the consensus of 40,000 test subjects. And yeah, I agree with you that for a significant number of photos I disagree with the consensus. But given that we're both posting on HN we're both also likely the kind of people who like to prove other people wrong by actively looking for and overvaluing counterexamples, so also keep that in mind. Because to be honest, for a large number of photos I also agree with the consensus.
Psychovisual image quality tests are difficult and complicated. Selection of the population, methods, guiding text, motivation, etc. can lead to strange things happening.
As an example of the difficulties an academy leading corpora TID2013 has a reversal in quality between categories 4 and 5 -- highest quality images seem to get slightly worse quality ratings than the next highest quality images.
I observed a leading image quality laboratory to produce substantially different results for the same codec when the person conducting the experiments changed.
There are hard opinions if images should be reviewed one image pixel to one monitor pixel, or if images should be zoomed like they are in practical use, particularly on mobile.
Should zooming be allowed? Should people be allowed to move closer to the monitor if they want as part of the viewing? etc. etc.
Yes, I agree it is incredibly hard. On the other hand, if anyone has put in the work to give the "least bad answer so far" to this question it's Jon Sneyers.
I don't have time to dig through Twitter just now, but instead of just picking one image metric they've used all of them to also compare the testing methods themselves.
All my own experience and observations also suggest that Jon did an excellent job on this and is likely the first person to ever do it well with crowd-sourcing. In the past I have seen similar quality of results with hand-picked experts (image quality people, photographers etc.), but not with volunteers or crowd-sourcing.
> At Cloudinary, we recently performed a large-scale subjective image quality assessment experiment, involving over 40,000 test subjects and 1.4 million scores.
This isn't about your, my or Jon Sneyer's individual opinions on these photos, it's the consensus of 40,000 test subjects. And yeah, I agree with you that for a significant number of photos I disagree with the consensus. But given that we're both posting on HN we're both also likely the kind of people who like to prove other people wrong by actively looking for and overvaluing counterexamples, so also keep that in mind. Because to be honest, for a large number of photos I also agree with the consensus.