Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because I've never thought about this, I'll ask the dumb question...

A shopping cart has many items. An item belongs to a shopping cart. In a relational database, without foreign keys, how do you associate the shopping cart with the items?



The suggestion is to not use foreign key constraints in the database.

The suggestion is not to eschew columns which might be JOINed on in a query.

I.e. "Don't enforce FK relationships with a constraint in the DB. Make sure the values in both tables which may be joined are consistent by using application code to enforce this."


> Don't enforce FK relationships with a constraint in the DB. Make sure the values in both tables which may be joined are consistent by using application code to enforce this.

For those reading at home who aren't good with databases. This is exactly the same statement as:

"Don't enforce valid inputs on the backend. The javascript front-end will enforce it for us".

Always validate your inputs at the appropriate layer. Your backend can never trust input from a web front-end (even if there is "only one web front-end"). Likewise, your database should never trust input from its clients (even if there is "only one client"). Ignore this rule at your own peril.


Except hopefully your customers do not have direct access to your backend.


Are your engineers omnipotent beings who never fail, never make mistakes and understand the system entirely (even all the parts they don't know about)? Is your QA process perfect? Your servers never crash in weird states?

Lucky for you!


A foreign key is a hard-constraint that must be enforced.

There's no stopping you from storing item IDs in a theoretical shopping cart table without FK constraints though. It's just that the onus is on your application to provide the guarantee that the item exists.

This becomes a little clearer if your item IDs are not simply auto-incrementing IDs (which have their own challenges in a large distributed system), but instead are SKUs, or things which have actual meaning.


If you have a need for a ID that is unique but otherwise meaningless in a distributed system, feeding a concatenated node ID, timestamp (to ms), and looping transaction counter (mod a sufficiently large number) into a secure hash like SHA-256 should take care of it.


Sure, that satisfies the uniqueness aspect if that's all you're looking for in an ID. But it doesn't satisfy the predictability aspect.

You can't know what the ID will be ahead-of writing the item to the database, nor store the item in a shopping cart without retrieving the item via a different piece of information (which is pretty likely to be SKU in this case)


When there aren't any connections to known unique keys (like SKUs for shopping), PK predictability isn't really possible (or necessary). The above formula is just an alternative to using auto-incrementing integers when in a distributed environment.


In a non-relational model, you'd generally copy the items/products and their quantity into the shopping cart.

That's a huge change from the relational model, and takes much more space than a normalized approach, but it also comes with some advantages. For example, it makes sure that the price of the items in the cart remain constant, even when you change product prices. That way, you don't have to inform the user "oh, your cart just got 8% more expensive" before checkout.


Agreed. Invoices, for example, are a representation of history so they need the exact data that was used at the time. They can have a reference to a product id but at any time it's possible for the data associated with that product to be completely changed from the meaning at the time of the invoice. This happens with, e.g., UPCs, which can be reused. On the other hand, if you have assemblies of products then you probably want those items to update themselves automatically.


It's confusion over the terminology. There are two definitions of "foreign key" in wide use:

(1) When one table's key occurs as a value in another table, in that second table that column is sometimes called a foreign key. For example, table A has id, table B has A_id, and people refer to A_id as "a foreign key".

(2) When you define this relationship explicitly in the database, so that the database can enforce it (and/or to document it), that's called a "foreign key constraint" but also sometimes just a "foreign key".

This article just means they don't use #2. It doesn't mean they don't use #1.

The phrasing was probably clearer in the context of the discussion they were having at the time because the comment above it linked some docs for the software in question (qh-ost), and under the "Limitations" section, those docs said, "Foreign key constraints are not supported." So to them it was clear they were talking about database constraints.


I agree, whether or not a given data model has foreign keys is a math question with a correct answer (up to isomorphism of the data model); if it does, where in a system they should or should not be enforced is an entirely separate, implementation (vs semantic) level issue.


you are talking about relation, in this context fk is an abstract concept

linked article describes fk as an dbms construct - a schema constraint

relations and conceptual foreign keys may exist without dmbs constraints, in such scenario dmbs does not guard validity of conceptual "foreign keys" and treats them as usual data

for example consider schema:

user{id,name}; book{id,title}; library_borrow{id,user_id,book_id,date}

if you want you may define a constraint:

ALTER TABLE library_borrow ADD CONSTRAINT FOREIGN KEY (user_id) REFERENCES user(id) CONSTRAINT fk_library_borrow_user_id;

if you do not define such constraint then dbms will not verify existence of records in `user` table when you insert data in library_borrow

absence of constraints affects also indexes - if you do not define a constraint you usually have to manually define an index on fk column


The association would still exist implicitly in the code that uses the database. The columns could stay the same, and the code could still use the item's hypothetical cart_id column to get the items in a cart.

You lose the explicitness and safety of referential integrity in the database, but it's a trade off that can enable other things, like sharding, as mentioned by the GitHub employee.


In what way does sharing make it impossible to have database enforced referential integrity?


You do not have to have a foreign key defined to do a join across tables. Foreign keys just enforce that all the values in the given column of a table are present in another given column on a specified table (usually a second table). They prevent you from entering data that does not match up.


The way I've usually seen it done is that the foreign key columns still exist. The relationships are there, the database just doesn't enforce them.

The items table might have a shopping_cart_id column


A FK declaration uses named field(s) in your table. You can omit the FK but the associative fields remain. An FK is a constraint on what values can be in those fields.


There is no difference when not having foreign key constraints.

In both cases you can't delete items from your catalog or you'd break old carts. So you just don't do it.

Another way is to copy the product from the product table to the cart table. This saves you from making references to versions of products.


An item should never be deleted from a catalog. That would break a lot more than carts.

If you need to track the inventory status/availability of an item, then you store that in the database. The specific implementation (field, list of current items, etc) depends on your needs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: