This is very silly. You're not doing the challenge if you do the work up front. ...

philbe77 · 2025-10-24T15:17:34 1761319054

You should do it then, and post it here. I did do it with one machine as well: https://gizmodata.com/blog/gizmosql-one-trillion-row-challen...

NorwegianDude · 2025-10-24T15:50:31 1761321031

Nobody cares if I can do it a million times faster, everyone can. It's cheating.

The whole reason you have to account for the time you spend setting it up is so that all work spent processing the data is timed. Otherwise we can just precomputed the answer and print it on demand, that is very fast and easy.

Just getting it into memory is a large bottleneck in the actual challenge.

If I first put it into a DB with statistics that tracks the needed min/max/mean then it's basically instant to retrieve, but also slower to set up because that work needs to be done somewhere. That's why the challenge is time from file to result.