refactoring - unified hashtables
Problem
There are currently two different hashtable implementations.
- Array hashtable in HAT-trie, that allows overfilling
- Hopscotch hashtable in RRL
From my point of view, Hopscotch hashtable is extremely efficient and quick, so for this reason I intend to remove ahtable completely and rip the Hopscotch hashtable into a separate file with proper API, documentation and tests.
What do we need
I would prefer to have internally chained table for RRL to prevent attacker controlling the amount of allocated memory. For HAT trie, it would also be better to not have an overfilled hashtable, so we can split tables only after they are full to conserve memory. Also, we don't need to resize the tables or any advanced functionality. HAT trie requires an ordered index of the keys in the table and a "iterator-like" interface.
Why I ditched the ahtable and went for Hopscotch
Hopscotch hash is useful for both uses and cleverly solves the performance degradation when filled. Each hashbucket has a bitvector of N adjacent buckets (uint32_t size). When inserting, element goes to the first bucket which bitvector index is 0. When the bitvector is full, table is scanned for a free slot and then bucket bitvectors are reordered until the free space is made in the vicinity of the target hashbucket. This probably doesn't make much sense, does it? Have a look at the http://en.wikipedia.org/wiki/Hopscotch_hashing
TL;DR It is internally hashed and memory efficient.
What else was changed
Trie used to differentiate between PURE and HYBRID buckets.
Pure buckets consisted of a keys that started with the same letter.
Hybrid buckets consisted of a keys without any specific requirements.
I changed the semantics and the trie split function, so now the trie node is either well... trie node or a bucket. This again makes things simple.