Highly opinionated SQL schema design that's easy to start with and scales well

McMini · on April 25, 2023

Could you provide an example of how you would implement this approach to store Credit/Debit events for an account? Additionally, how would you handle a scenario where there are 30,000 events on the account, and you need to calculate the balance to prevent overdraft?

hot_gril · on April 25, 2023

Heh, you found the hard case. You want to add a denormalized table (point #6) specifically for locking on the balance, just cause Postgres/MySQL `serializable` mode is way too slow to rely on. You still keep baseline insert-only credits/debits table(s) that you insert into in the same xact, and all the usual rules apply there.

You can also do this without making such an exception. I used to keep a separate "pending" table that I'd insert into, commit, then check the balance with the pending row included before moving it to non-pending. So two transactions. That worked, problem is it was annoying. Though it was a good solution for debits/credits that involved an async external step that could fail or time out; simply ignore the pending rows that are too old and never got resolved. 30K rows is still small enough to query quickly.

McMini · on April 26, 2023

What are the key trade-offs in your approach vs doing event sourcing?

hot_gril · on April 26, 2023

Tbh I had to just look up event sourcing, but thanks, I'll add that to my vocabulary. Seems like my approach includes a form of event sourcing, though I'm not always storing the ordering of events, only in cases where it's needed. And that concept isn't specific to relational DBs.

fogzen · on April 26, 2023

Why normalize? Go even further and use a universal relation / Entity-Attribute-Value pattern. One table with columns entity_id, type = assert|retract, key, value

Essentially, use SQL as an immutable key-value store and use views for queries.

hot_gril · on April 26, 2023

With a KV store of denormalized info, how would you store and efficiently query the count of "like" reacts user A made on all posts that have also been liked by user B?

fogzen · on April 29, 2023

SELECT count(likes_a) FROM facts as likes_a, facts as likes_b WHERE likes_a.attribute = ‘like’ AND likes_b.attribute = ‘like’ AND likes_a.value = likes_b.value AND likes_a.entity_id != likes_b.entity_id AND likes_b.entity_id = 123

hot_gril · on April 29, 2023

Ah, I also should've looked up EAV since there's a lot on this. I don't see what you gain from this other than not having to define new tables/cols for new use cases, and the downside is any DB structure has to be enforced entirely in code. I've been down routes like this before (not EAV but other sorta polymorphic DBs), and it's never ended well. Became unsafe to touch the code at some point, especially in a team setting.

On the more technical side, I don't know if Postgres would handle this efficiently. The `attribute` col has the classic low-cardinality problem for indexes. The one big table you use would have many very different access patterns. There are DBMSes more designed for this use case.

I'm also not seeing very much EAV adoption mentioned on Wikipedia. I can see this working for some specialized use cases, though. Has this worked well for you in the past?

fogzen · on May 1, 2023

I personally think it’s safer to enforce data constraints in application code, because “bad” data won’t break it. Putting the constraints in the database means your application is coupled to whatever database you use. Worse, because databases only offer a small set of constraints, it’s often the case that constraints are split between the database and application code. You have “validations” in both the application code and database definitions, and data must be checked/transformed anyway whenever it is marshaled to and from the database. Constraints in the database are effectively implicit, living in a set of migrations or often not part of any source control.

EAV indexing is much easier, because EAV is naturally represented as a tree. The whole database is the index. You only have four universal indexes that cover every query: EAV, AEV, AVE, and VAE. The mutual likes example above is covered by these indexes. And you never had to worry about it!

See Datomic for real world example of an EAV architecture.