Thanks but I saw the numbers, however to me the numbers just say "It's terrible" and leave me with the same question I had before, what's actually on the stack ? It's not going to just be empty space left there as a joke, are there... cached values? Pointers to something (what?). I suspect it's supposed to be obvious but I don't get it.
My "Eric's Famous Pythagorean Triples" blog post is 5+ years old and possibly not addressing exactly the same issue as think-cell/Google cares about, but, you might still be interested to read it:
https://quuxplusone.github.io/blog/2019/03/06/pythagorean-tr...
In my case, the blowup happens mainly because the Ranges style tends to turn easily fungible code like `if x < y` into relatively non-fungible data like the capture-list of `[y]() { return x < y; }`. The former is easily subjected to optimizations like the hoisting of loop invariants; the latter is not. Another way to put this is: The compiler is happy to mess with the layout of the function stack frame up until the very last minute; and to mess with the coroutine frame almost as long; but the layout of a lambda (or std::bind) capture frame is set-in-stone very early, by the front-end. And Ranges loves lambdas.
According to https://en.cppreference.com/w/cpp/ranges/filter_view return value of views::filter is either a new ranges::view or some sort of range adaptor that seems to be equivalent to a ranges::filter_view.
I can only guess that it is those return values that are temporarily put on stack so that the consequent | operations could be chained. My hypothesis, without knowing much about this stuff, is that since the compiler cannot elide them (according to the godbolt link from the presentation), stack usage, as found by Google, has cubic growth. I don't quite understand why since I guess it's expected to grow linearly, e.g. 48 bytes at a time.
The naive implementation of range would have a range represented of a pairs of iterators, with every iterator containing a copy of each upstream range, which blows up quadratically; libstdc++ implementation is not implemented this way though: each range has one copy of the upstream range plus the predicate, while this could be optimized further for stateless predicates, the range growth should still be linear.
Yet, when I print the final size of the chained filter ranges and I saw it increasing superlinearly with the number of stages, so I must be missing something.
There is also additional wasted stack space for each temporary range. In theory the compiler should be able to optimize them away, but it probably doesn't always.