Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I see several issues:

- No out of the box horizontal sharding, according to the post they had 4TB (compressed) data in the cluster in 2017. Looking at their growth I think it is safe to assume that today they would have >50TB which can't be done on a single node. You could use Citus but this is not exactly vanilla Postgres anymore. For such a simple data model wasting time implementing your own sharding solution and (more importantly) shard migration makes no sense.

- Discord is storing text data, in Postgres this will be stored in TOAST tables which has some drawbacks.

- Their workload is mostly inserts, almost no updates. Vacuum only operates on complete tables so you would wast I/O and CPU processing data which you don't even touch. You can partition tables but it's a manual process and you have to make compromises. In 2017, Postgres partitioning still had many performance drawbacks.

- No out of the box redundancy.

- Once your data doesn't fit in memory, Postgres performance becomes unpredictable.

Personally I would have chosen ElasticSearch for this project.



Why would the text data be stored in TOAST?

My understanding was PG only uses TOAST when the data is too large to fit in the row, and since PG compresses data before inserting wouldn't user messages be fine?


I think Discord messages can be unbounded in size


Interesting, trying it out on Discord the default max msg size is 2000 chars and with Discord Nitro the max is raised to 4000 chars.

Testing with Postgres, a 2000 char random sequence doesn't result in TOASTing, but a 4000 random sequence does get TOASTed

And for kicks, 4000 chars that aren't random compress well enough that they don't end up in TOAST.


Default for TOAST is 2KB and how many chars you can fit in this space depends on encoding, could be as low as 1000 uncompressed.

I think this limit can be adjusted, if you know that your message limit is 4000 chars its definitely a good idea.


Do you have any case studies for ElasticSearch that you can recommend, in projects similar to this? Would be very interested in seeing what that option would look like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: