Hacker Newsnew | past | comments | ask | show | jobs | submit | bambo222's commentslogin

Why would you do cut|sort|head? You should instead just ask the k-sorted merge question about external sorting.

As a FAANG data scientist, I've never once wanted to use cut|sort|head nor have I wanted to work with CSV's. Everything is already sharded and encoded as a schema-enforced binary encoding like protobuf or thrift. The file is so large its better to favor Apache Beam or equivalent to parallelize the aggregations of particular fields over very large amounts of data. But, hopefully you just use some SQL-like interface such as BigQuery that when pointed to sharded files, can easily do aggregations for you with SQL-like language (which, kicks of distributed computing jobs under the hood and is not truly relational). Unless you're streaming data, then that's another question.

Testing unix commands is narrow minded IMO. If you want to test divide and conquer plus streaming, then just ask a flavor of that Leetcode question.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: