Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The larger your data, the more that indexing and maintaining them hurt you. This is why they do much better at larger datasets vs small data sets. It’s all about trade offs.

To overcome this, they make use of cache and if the small data is frequently accessed, the performance is generally pretty good and acceptable for most use cases.



Did anyone else notice the surge of brand new accounts that are appearing on these discussions of Databricks with pro-Databrick opinions?

If we had access to IP address of the posters, I sure would be interested in looking at correlation among them.


What about my comment above is pro-Databricks? Snowflake works the same way. So do most large scale DW insert Exadata, Netezza, etc...

Does anyone else notice people questioning common sense?


with most people working from home, not sure if this heuristic works.

disclaimer: works for databricks, but not on spark, and first time posting in this thread




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: