The author seems mistaken about a couple things relating to hash tables. He does...

pacoverdi · on Feb 27, 2017

> He doesn't seem aware that, by far, the main reason for using power-of-two hash table sizes is so you can use a simple bitshift to eliminate modulo entirely.

   [...] using a power of two to size the table so that a hash can be mapped to a slot just by looking at the lower bits.

He seems to know this bit :)

mark-r · on Feb 27, 2017

He appears to know about bit masking but not bit shifting, which seems odd.

krajzeg · on Feb 27, 2017

There is a lot of material in the article on power-of-two vs non-power-of-two sizes and it definitely includes a discussion of how POT is convenient due to modulo becoming a bitmask (not a bitshift, which would be division).

tveita · on Feb 27, 2017

That bias graph is made assuming you are picking numbers between 0 and 32768. With a 64-bit hash the bias is negligible. With a 32-bit hash I suppose you could argue significance around 100 million items.

I think the text is clear that the reason the hash table supports power of two size tables is to avoid the modulo operation.

anon1385 · on Feb 27, 2017

>The author seems mistaken about a couple things relating to hash tables. He doesn't seem aware that, by far, the main reason for using power-of-two hash table sizes is so you can use a simple bitshift to eliminate modulo entirely.

From the article:

>This is the main reason why really fast hash tables usually use powers of two for the size of the array. Then all you have to do is mask off the upper bits, which you can do in one cycle.

hinkley · on Feb 27, 2017

Yeah I'm gonna need an explanation too. Even with hashes hardened against DOS attacks, the lower bits tend to vary with the input far faster than the upper bits.

For instance, several popular hash algorithms (of the form 31 * hash + value) will have the same upper bits for any two strings of the same length, up to about 12 characters in length when the seed value finally gets pushed out. Unless you're still using 32 bit hashes for some reason and then it's most strings under 6 characters long.

Multiply by prime and then add or xor the next value guarantees that the bottom bits are less biased and so fits with any hashtable function that uses modular math, even if the modulus is taken by bit masking. Getting the upper bits to mix would take a different operator or larger operands.

I know I've encountered at least one hash function that uses a much larger prime (16 or more bits) as its multiplier, so it's not unheard of, but it's certainly not universal.

rurban · on Feb 27, 2017

Exactly. It's a bitmask, not a shift.

But he doesn't know about ctz (counting trailing zeros) for fast load/fillrate detection. With 50% it's trivial (just a shift), but for 70% or 90% you'd need to use _builtin_ctz.

hinkley · on Feb 27, 2017

Not that I disagree with you, but a number of these so-called 'prime' hashtable implementations don't actually use prime numbers. They use 'relatively prime' numbers.

If you happened to use 2^n ± 1 you'd have very little bias according to that map, but you wouldn't strictly be using a power of 2.

Unfortunately Java's original hashtable started at 11 and increased using f(m) = 2 * f(m - 1) + 1, giving you 23, 47, 95, 181, etc. Or pretty close to right on the bad parts of the bias curve shown in that link.

Makes me wonder, if Java hashtables had been given a default size of 15 or 7, if this ever would have been fixed...

kwillets · on Feb 27, 2017

The article is specifically about when N is not a PO2.