Hacker Newsnew | past | comments | ask | show | jobs | submit | HenryR's commentslogin

You might not be aware that this is one of, if not the, seminal paper in distributed systems.

Lamport clocks, described briefly in the review, are an answer to the general problem of establishing an order between events that happen on different machines that respects causality. They’re still practical to this day, over 50 years after this paper was published.

There are lots of in-depth descriptions of the paper and its contributions elsewhere. I’d totally recommend finding them - it’s a great paper and not a hard read at all.


This might be useful: https://web.stanford.edu/class/ee384m/Handouts/HowtoReadPape...

If you find there is just too much unfamiliar technical language, it’s a good idea to pause and look up a definition. You might need to follow that chain several steps, but that will help you get to an understanding of new terms.


Companies usually provide the space and will sponsor pizza and drinks, at least in SF. That’s about all the costs beyond the time of the organizers which is offered for free.


I went to some Papers We Love meetups here in NYC a few years ago and learned a lot not only from the presenters but from other attendees ranging from database experts to sound engineers. Some very nice, kind people!

Plenty of pizza and beer, too :)

Warmly recommended.


Yep. For many other chapters as well.



Authors try to compare their work against hash tables because, usually, HT represent an upper bound on the performance of point-lookups; we don't know how to do much better in general.

So if your data structure supports range queries _and_ point lookups, you should measure against hash tables to understand how far off the ideal you are. If it's not far, and your data structure is strictly more general, that's compelling.


Good point!


Should be fixed now! (But I've been wrong with css before)


Oh yeah that's no good. I'll fix that when I'm back at my computer. Thanks for pointing that out!


Yeah, Masstree has settled into the standard set of comparator systems for most research since it was published - and not as a strawman "this system was crap, so let's pretend we've done good work by beating it!" but as a real challenge to do better than.

Anna is on my list of systems to include in this review (see https://www.the-paper-trail.org/reading-list/). Looking forward to it!


Author here - if you like this you might also like another paper summary of mine in the same vein:

https://news.ycombinator.com/item?id=18132730


So I was wondering... are there sensible structures that combine hashmap and tree by having a double index?

Reason: Ordered access or range queries need a (radix-)tree.

So insertion and removal need to pay for the comparatively slow tree search and rebalancing.

Lookups or mutations could use a hash table that references the same data, especially if no key compression is used for storage.


I don't know of one, but it's such a natural idea that I'd guess it's been studied. There are standard implementations of LRU caches that use e.g. a hash map and a linked list to get both fast lookup and ordering, but for real performance I think you'd want to try and minimise the number of data structures to avoid having competing cache behaviours.


Here's a recent comparison of Masstree to ART: https://twitter.com/andy_pavlo/status/986647389820747776?s=2...

ART looks to be better in most cases. It's on my list of K-V stores to review: https://www.the-paper-trail.org/page/reading-list/


Cool, looking forward. Please put Judy in as well :)

Seriously, the 17 year old judy is still pretty good, despite it's lack of use of cool vector load / compare / etc instructions for quickly traversing tree nodes that are too small for a full radix search (ART does "simultaneous search", i.e. compares to all stored keys in a single instruction, while judy afaik runs a linear search).

It would be pretty cool if someone vectorized that in judy, and replaced null-terminated strings by a binary-safe representation. Unfortunately, all implementations are old and very hard to read.


I could either figure out how Judy works, or review another three papers :)


:)

My super high-level partial understanding of Judy is the following:

Start with ART, which is pretty simple. Then do the obvious improvements: First, we want fast access to "element number N in sorted order", i.e. you also store number of descendants. Next, you do key compression: Storing the portion of the key that can be reconstructed from the tree traversal is silly. Next: an 8-bit tag for signaling one out of 4 node types? Are you, like, filthy rich? More node types it is. Next: Spending 64 bit for a node pointer? Are you crazy? Put the type-tag into the parents pointer (earlier resolution of the branch!), and use a custom allocator to get smaller pointers (you allocate big segments and your node pointers are offsets).

You end up with an unintelligible monstrosity. Give it a cute name, forget about SIMD because it is 2001, and you end up with something like Judy.


For me the requirement is: plain C interface, usable under MIT/BSD or less-restrictive license

Something gets bonus points for not needing other libraries or static data, so I could put it in a kernel, but usually I don't really need this.

For example, Judy is LGPL. Darn. That alone is trouble, but Judy makes it more questionable due to non-trivial code in *.h files.

Any good options?


tangential , Andy's DB lectures are rocking my world

https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: