Banner seen at the start of the Video can be translated as "Starting May 22th this Building is going crazy.". Its a play on words that also means that the building is being moved starting May 22th.
Prometheus is a (rare) recent example of a significant, thriving open-source project that is community-based. Not quite as broad in scope as something like Elasticsearch though.
Depending on what "community-driven" means, I'd like to add Python, Blender, Jupyter, Django ...
I'd say "thriving community-based open source projects" are rare in the sense that most open source projects don't thrive (especially if you count every open-licensed repository on github), but there are tons of examples.
OP here, my bad indeed.
I was fiddling around trying to figure out the right way to shorten and carelessly ended with this. No intention to mislead, but I agree this changes the meaning in a bad way.
Most importantly I don't see a way to revert; hopefully the moderators will see this and do it instead.
Histograms require you to configure buckets into which your samples are allocated; to allocate the buckets appropriately, you need to know what your expected values are — that is, to measure latency, you need to know your latency. While this can work (I think most of us have a clear idea, or can obtain an idea of what our typical latencies is, and configure buckets around that) it is inelegant. I feel like I would rather have X=percentile, Y=latency, but such a bucketing gives you X=latency, Y=request count. Still useful, but only as informative as you are good at choosing buckets. (There is the histogram_quantile function, but I am unclear that its assumption of linear distribution within buckets really makes much sense, since most things would be long-tail distributions, and thus I would think that once you get past the main "hump" of typical latencies, most samples would cluster towards the lower end of any particular bucket.)
I am not clear on how Summaries actually work; they appear to report count and sum of the thing they're monitoring; that is, if one were to use them for latencies (and the docs do indeed suggest this), it would report a value like "3" and "2000ms", indicating that 3 requests took a total of 2000ms together; how is one supposed to derive a latency histogram/profile from that?
Prometheus's fatal flaw here, IMO, is that it requires sampling of metrics. That is, things like CPU, which are essentially a continuous function that you're sampling over time. But its collection method/format doesn't seem to really work that well for when you have an event-based metric, such as request latency, which only happens at discrete points. (If no requests are being served, what is the latency? It makes no sense to ask, unlike CPU usage or RAM usage.)
To me, ideally, you want to collect up all the samples in a central location and then compute percentiles. Anything else seems to run afoul of the very "doing percentiles on the agents, then 'averaging' percentiles at the monitoring system" critique pointed out in the video posted in this sibling comment: https://news.ycombinator.com/item?id=18194507
Your points are largely valid, but prometheus is a monitoring solution, not a scientific or financial tool.
Certain tradeoffs are taken since the monitoring aspect comes first and being scientifically correct comes second.
Hence poll vs push, for instance.
M3QL still is the primary query language internally, however the query service is still being rebuilt in open source M3 - so it’s not available just yet.
As to why we’ve kept it, there’s been such a large amount of functions we’ve added over time that don’t really have an alternative in say PromQL, etc - so we aim to offer both PromQL and M3QL in open source land and letting end users use either one. I don’t think either are better than the other one, they’re just different flavors - function expansion vs pipe based, etc.
We are working on making WarpScript (http://www.warp10.io) able to fetch data from M3, so you can benefit from 850+ functions for analyzing your time series data.
We ran into a number of issues with Helm when deploying - failures leading us to have to rollback, with rollbacks then failing, requiring manual changes to unblock.
I think that for third-party packages and related templating (which seems like the original use-case) it works well, but I would be wary of using it for high-res deploys of our own stuff.
The pipelining benchmark is identical to that of Japronto (another, very similar thing posted here on HN a few days ago). Japronto's repo on GitHub holds the wrk pipelining script used.
I haven't had the time to add configurations for every server tested (esp. Apache & NGINX) but the main point here is to showcase the Node.js vs. Node.js with µWS perf. difference.
We don't need to take his word for it. It's open source, so we can run the tests ourselves.
I think it's completely understandable that he threw in the others, probably default config, without caring much about it since they weren't the point of the writeup.
It has a mostly-compatible API but strict conformance doesn't seem to be the goal here. If your application does not make use of obscure features provided by core http (it could probably be refactored to do without anyways), then it's a free boost in performance.
It's taken from close up so you see how a concrete platform has been poured under the building, then the whole thing pushed on rails.