Own HashMap
January 22nd, 2008
When needing a container hash tables are very common data structures. They provide efficient key based operations to insert and search for data in containers.
1. What are the advantages of a hashmap in comparison to a simple array or vector and when to use it?
- Hash tables are damn fast because keys are transformed to a hash using a hash-function and a number that is used as an index in an array to locate the desired location (”bucket”) where the values should be.
- Hash tables support the efficient insertion of new entries
- Searching for the required data is often independent of the number of items stored
Hash tables are most useful when large numbers of records are to be stored, especially if the size of the data set can be predicted.
2. Things you should consider when designing a hash map
Most important is the hash function and the collision resolution mechanism. The hash function is responsible for the arithmetic operation that transforms a particular key into a particular table address. The collision resolution mechanism is responsible for dealing with keys that hash to the same address.
A good hash function is essential for good hash table performance.
Recommented for typical hash tables is Bob Jenkin’s LOOKUP3 hash function which is very well performing.
A mathematical byte-by-byte implementation that performs particularly well is the Jenkins One-at-a-time hash, adapted here from an article by Bob Jenkins, its creator. The good thing about it is that it’s very simple and can be understood easily.
uint32_t jenkins_one_at_a_time_hash(unsigned char *key, size_t key_len) { uint32_t hash = 0; size_t i;for (i = 0; i < key_len; i++) { hash += key[i]; hash += (hash << 10); hash ^= (hash >> 6); } hash += (hash << 3); hash ^= (hash >> 11); hash += (hash << 15); return hash; }
If two keys hash to the same index, the corresponding records cannot be stored in the same location. This can cause fatal errors in data storage.
3. How to use
Although the C++ STL (Standard Template Library) does not support hash based containers, most compilers like GCC or Visual C++ have their own implementation.
- wxHashmap are implemented in wxWidgets
- MFC provides CMap and CTypedPtrMap
When getting a bit more comprehension for those maps I tried myself to make a small example:
template < class key, class value, class hash_function = hash<key>, class increment = unit_increment<key>, class equal_key = std::equal_to<key>, class allocate = std::allocator<std::pair<key, value> > >class OwnHashMap { private: typedef std::pair<key, value> Map_pair; typedef hash_table < key, Map_pair, hash_function, increment, equal_key, selectFirst<Map_pair>, };
For better accessing the table an iterator can be useful:
template < class hash_container, class constness_traits ></code> <code>struct hash_table_iterator { typedef typename constness_traits::pointer ptr; typedef typename constness_traits::reference ref;</code> <code>reference operator*()const { return (*this->container)[this->current].item; } pointer operator->()const { return &(operator*()); } };
Now it can be used that way:
typedef MyMap<int,double> Map; typedef Map::value_type value_type; typedef Map::iterator iterator;Map map; MyMap.add(value_type(1, 1.1)); MyMap.add(value_type(3, 3.23));int n = map.size(); //access values for ( iterator it = map.begin(); it != map.end(); ++it ) { std::cout << it->first; std::cout << it->second; }
That’s just the easiest way of implementing a hashmap. That solution could easily be extented with sort algorithms or whatever you need for better data-storing.
Finally I’d like to advice Google SparseHash to you.
Google SparseHash project contains several hash-map implementations in use at Google, with different performance characteristics, including an implementation that optimizes for space and one that optimizes for speed. The memory-optimized one is extremely memory-efficient with only 2 bits/entry of overhead.






Excellent explanations! You made me understand a complex issue very easily.
Comment by Jeff — January 23, 2008 @ 7:48 pm
Thanks a lot
Comment by Kai — January 23, 2008 @ 7:50 pm
Thanks a lot for the advice. The sparsehash project is exactly what I was looking for…
cu Ralph
Comment by Ralph — August 14, 2008 @ 4:26 pm