Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pretty much anyone working in ad tech.

Not necessarily trillions at a time, but even small ad tech firms deal with billions of new data points across many dimensions every day.



Ad Tech has no need for realtime data viewing or aggregation (in this manner), even for platform log data. Offline parallel processing is the standard. Redshift is particularly efficient, while others use Spark or other ad-hoc solutions.

For users, you always want mediation/adjustment steps that (can) modify realtime data to provide timesliced totals. For developers/administrators, you want to be able to persist data. Running totals in memory are too fragile to be reliable. There is an assumption of errors, misconfigurations, and bad actors at all times in AdTech.


Has no need? Did you just make this up?

We used MemSQL for real-time data for 2 years. All data is fully persistent, but the rowstore tables are also fully held in memory compared to columnstores which are mainly on disk. There's nothing fragile about it. SQL Server's Hekaton, SAP's HANA, Oracle's Times Ten, and several other databases do the same.

Timesliced totals is just a SQL query, and mediation or some other buffer from live numbers for customers is up to every business to decide, not some default proclamation for an entire industry.


Actually RTB's do need to do processing quickly - RTB stands for Real-Time Bidding, and bids are rejected after 250 ms.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: