The query in the example is not “count” but a count + group by query. While it is possible to precompute group by results on every possible columns per page we don’t do that and this query does touch every row. The article touches on how this is possible: operations on encoded data, AVX2, not using a hash table for the intermediate result of the group by. And we certainly don’t fake it with storing query results.
What does the columnar format look like? Particularly, is the group by column compressed with RLE? That’s kind of a pre-computed group-by + count that would make this kind of query very very fast :)
This is an astute observation. Here is the accurate answer from the author of the feature:
“We do have special handling for RLE for filters and group by column but in both cases there would be per row component (updating selection vector for filter on RLE, updating aggregates for group by on RLE) and so far I saw RLE encoding usually slower than dictionary encoding for both these cases”
We guarantee that the result is legit.