A security-focused rewrite of a security-critical program that removes insecure features and prevents whole classes of vulnerabilities from being introduced in the future is hardly “useless”.
1) Every team does something different because none of them talk to each other. There are very few horizontal programs across engineering there. As a result, processes and results vary greatly.
2) They're very "traditional" in many ways. They're not a fast moving engineering led company, they're a slow moving business and marketing led company. Engineering is not their secret sauce (except perhaps some bits of hardware engineering). They are sometimes the sort of org that says why both with automated tests when we have a QA team.
Umbra is not an in-memory database (Hyper was). TUM gave up on the feasibility of in-memory databases several years ago (when the price of RAM relative to storage stopped falling).
Thanks for the correction. My understanding was that it was still in-memory but "fell back on" disk. ART indexes were touted as one of the novel aspects of Umbra, and my understanding is that ART doesn't work well as an on-disk data structure, so I guess I need to read up on the architecture now.
No, again, ART was Hyper's specialty. Because you're right, ART specializes at in-memory workloads it is not amenable to paging.
I believe Umbra is heavily BTree based, just like its cousin LeanStore.
One of its specific innovations is its buffer pool which uses virtual memory overcommit and multiple possible buffer sizes to squeeze better performance out of page management.
My understanding is the research projects LeanStore & Umbra -- and now I assume the product CedarDB based on the people involved, etc. -- are systems based on the observation that a) existing on-disk systems aren't built well with the characteristics of nVME/SSD drives in mind b) RAM prices up to this year were not dropping at the same rate as they were early in the 2010s, meaning that pure in-memory databases were not so competitive, so it's important to look at how we can squeeze performance out of systems that perform paging. And of course in the last 6 months this has become extremely relevant with the massive spike in RAM prices.
That and the query compilation stuff, I guess, which I know less about.
In-memory DBs were always a dead-end waste of time. There will always be bigger slower cheaper secondary storage, no matter how cheap main memory gets. And workloads always grow to exceed the available main memory (unless a system is dying). And secondary storage will always be paged, because it's too inefficient to address it at smaller granularity.
That's just reality, and anyone who ignores reality isn't going to get very far.
Thanks for the corrections and info. Will check out that video. I could have sworn I had read these things about Umbra, but I suppose it was HyPer. Both interesting designs!
I am currently writing an on-disk B+tree implementation inspired by LMDB's blazing fast memory-mapped CoW approach, so I'm quite keen to learn what makes TUM's stuff fast.
Yeah I think the way Umbra was pitched when I watched the talks and read the paper was as more as "hybrid" in the sense that it aimed for something close to in-memory performance while optimizing the page-in/page-out performance profile.
The part of Umbra I found interesting was the buffer pool, so that's where focused most of my attention when reading though.
Maybe there's an intermediate category: people who like designing software? I personally find system design more engaging than coding (even though I enjoy coding as well). That's different from just producing an opaque artifact that seems to solve my problem.
He has built a successful brand by drawing sweepingly cynical conclusions from some rather limited experience (along with clickbait titles), to appeal to cynical audiences like HN.
reply