Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Separating the storage layer from the database and query layer seems to be the future of databases - there's no reason for every database to re-implement the storage and replication layer just to provide a different query language or data model. This is also what makes FoundationDB so attractive for me. Are there any other large projects doing this?


People often greatly underestimate the performance loss incurred by separating the storage layer from the execution layer. The difference is an integer factor. In a high-performance database design, the execution scheduling behavior must be tightly and intricately coupled to the I/O operation scheduling behavior or you lose much low-hanging throughput optimization.

That said, this can be solved by changing the level of abstraction. If you tightly coupled the execution and storage engine as a single module, which is not intrinsically dependent on the type of data model, it would allow high-performance multi-use within reasonable constraints. Albeit with a very different API than a storage engine. But that is not a thing that exists in open source AFAIK.

The loss from separating storage and execution may not matter for databases where operational efficiency is not paramount (e.g. because the scale is too small for it to matter) but many database engine designers are not going to take that use case limiting performance hit because the CapEx/OpEx implications are large.


I think the point is that now that we're in the era of distributed/cloud/partitioned databases, the query logic cannot necessarily be colocated with the storage, so might as well get the advantages of decoupling.

The performance gains at the distributed level seem to rely on either parallelized compute (a-la MapReduce) or over-indexing (a-la ElasticSearch/GIN) rather than the sort of efficiency gains that were made at the level of optimizing between cpu/cache/memory/disk. I think modern problems regarding distributed databases are concerned with the time complexity not exploding as the size of the data does.


The query logic/execution is pushed down to the storage in most reasonable distributed database architectures, along with enough metadata that the individual storage engines are fully aware of (their piece of) the execution plan and orchestrate it directly with other nodes e.g. for joins. The reason for local storage is that this will be almost completely bandwidth bound. When you look at growing markets like sensor analytics for edge computing this is the only model that works at scale. Most new parallel databases aren't using a classic MapReduce model or really anything that requires centralized orchestration.

The distributed databases in open source are a bit of a biased sample as they tend to copy architectures from each other and are a trailing indicator of the state-of-the-art. For example, you virtually never see facilities for individual nodes to self-orchestrate parallel query execution (this is almost always centralized), which entirely changes how you would go about scaling out e.g. join operations or mixed workload. Building this facility typically requires a tightly coupled storage and execution engine, but open source systems are strongly biased toward decoupling these things for expediency. Open source database architecture is trapped in a local minima.


In other words, are you saying that distributed databases with local storage can greatly reduce network I/O with things like predicate pushdown, which is not possible when the database is built on top of a distributed file system?


> If you tightly coupled the execution and storage engine as a single module, which is not intrinsically dependent on the type of data model, it would allow high-performance multi-use within reasonable constraints. Albeit with a very different API than a storage engine.

This is what we are doing with FaunaDB; Fauna's own native semantics are essentially a unified intermediate representation of a variety of high level query languages, forthcoming.


Indeed, but it's not a new idea. This is Bigtable's architecture after all, which is now 15 (?) years old. There are a bunch of advantages: DB developers can focus on the DB, not the storage layer (which is in many ways much harder to get right), you can scale storage and processing independently, you can easily support different tiers of storage (SSD/HDD).

But by far the biggest advantage is operational. Your DB is now stateless! Nodes can die and be replaced in seconds, because they don't store any data; this makes designs like Bigtable's much more practical. You can quickly scale up the processing to deal with spikes without spending hours re-replicating. Operating the storage layer is still inherently challenging, but just has to be figured out once for all your DBs and other processing systems.

While performance can take a hit, this can be mitigated by local caching and good internal networking. It isn't able to meet the lowest levels of latency (sub ms), but in practice it works great for web applications.


Bigtable was always colocated on the data server. If it failed, you had to have nodes with bigtable having the same ranges of data.

Though later it went stateless.


I'm not totally clear on the history, but from the bigtable paper it appears that (at least circa 2006) Bigtable and GFS were not colocated.


CockroachDB is made up of several layers to implement SQL and transactions on top of a distributed key-value store: https://www.cockroachlabs.com/docs/stable/architecture/overv...

TiDB has a similar architecture: https://www.pingcap.com/docs/architecture/


The TiDB approach is a lot more like Aurora though as you have much finer grained controls over how many storage and query nodes you have.


Datomic does the same. Storage, query and writing are completely decoupled.

See more https://docs.datomic.com/on-prem/storage.html#storage-servic...


Microsoft’s new Azure SQL DB Hyperscale does something very similar:

https://www.brentozar.com/archive/2019/01/how-azure-sql-db-h...

I sketched out a lot of architecture diagrams in that, showing how it differs from traditional RDBMS’s, and I tried to keep the post approachable for database folks from different platforms.


It's too bad it doesn't perform very well. Was looking forward to an Aurora-like SQL Server.

I hope they fix it


In my eyes, Google pioneered the commercial application of this approach. The advent of BigTable and it's underlying colossus storage engine they literally pushed the advent of HDFS and all the BigData tooling from a decade ago.


Depends on the use-case you can look at performance of say ClickHouse vs alternatives which separate storage layer. The performance difference is fairly significant.


CosmoDb has a similar approach but still a long way to go




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: