HashMap basics and its internal implementation


Hi, I am Malathi Boggavarapu working at Volvo Group and i live in Gothenburg, Sweden. I have been working on Java since several years and had vast experience and knowledge across various technologies.

In this post we will learn about basics of HashMap and its internal implementation.


What is HashMap?


HashMap implements Map interface. Collection view of a Map can be obtained using entrySet() method. To obtain a collection-view of the keys, keySet() method can be used.

HashMap is not synchronized and allows null keys and values whereas HashTable is synchronized.

An instance of HashMap has two parameters that affect its performance: 

initial capacity 
load factor. 

The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. 

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). 

The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.



Ex:


package mapEx;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

public class HashMapEx {
public static void main() {
HashMap<Key, String> map = new HashMap<Key, String>();
map.put(new Key(1, "SV"), "Sweden");
map.put(new Key(2, "GE"),"Germany");
map.put(new Key(3, "NO"),"Norway");
Set<Map.Entry<Key, String>> set = map.entrySet(); // returns key-value pairs in the form of Map.Entry object
Iterator<Map.Entry<Key, String>> it = set.iterator();
while(it.hasNext()) {
Map.Entry<Key, String> mapEntry = (Map.Entry<Key, String>)it.next();
System.out.println("key :: " + mapEntry.getKey());
System.out.println("value :: " + mapEntry.getValue());
}
Set<Key> keys = map.keySet(); // returns only the keys
}

}


class Key{
int index;
String name;
public Key(int index, String name) {
this.index = index;
this.name = name;
}
public boolean equals(Object obj){
  return super.equals(obj);
}
public int hashCode(){
  return this.hashCode();
}
}


Internal implementation of HashMap

It is very important to understand how HashMap works internally. This section guides you about it.

Hashmap uses Array, Linked list and balanced trees datastructure internally for storing key-value pair. Balanced tree is introduced from Java 8. will see more about it later.

Hashmap works on principle of hashing and internally uses hashcode as a base, for storing key-value pair. With the help of hashcode, Hashmap stores objects and retrieves it in constant time O(1).

put() - In the above example, we are inserting key and value pairs. when we put, the hashcode of an object Key is calculated; in our example hashCode method is called to get hashcode and indexFor(hash, table.length) method is called finally to find the bucket. Bucket is nothing but an index of the Array to store the element.

If the same hashCode is returned for multiple elements in the hashmap, they all will be stored in the same bucket. This is called "hash collision". Here a linkedList is used to store the elements inside the same bucket. But there is a serious problem with this. If many elements were stored in the same bucket, the size of linkedList will be big, and get() method will do a linear operation on linkedList with equals method which makes the performance worst for retrieving elements. Here comes balanced trees into picture. This was instroduced from Java 8. Once linkedList reach certain threshold, it starts using balanced trees which will improve the worst case performance from O(n) to O(log n)

get() - This method uses hashCode of the Key object to find the bucket. After the bucket is found, it retrives the element and returns it to the user.


Some more info about HashMap

Immutability of keys:

If the object which is used as key in hashMap is modified, then we may have problem retrieving values from hashMap.
Let's say I use an object as key, and put this key and associated value into hashMap. Later I modify one of the properties of this key. If hashCode() and equals() method make use of this property, then we may not find this key in hashMap. If the property being modified is not being used by hashCode() and equals(), then we will still be able to find the key in hashMap. 

To do away with this issue altogether, recommendation is to use immutable classes as keys.

Load factor and resize:

When a hashMap resizes, it will double in size and create a new instance and populate it.

When new hashHap is being populated, the linkedList associated with each bucket of source hashMap is iterated and nodes are copied to the destination bucket. However, note that these new nodes are prepended to the head of the destination linkedList. So resizing has an side effect of reversing the order of the items in the list. Default load factor for hashMap is 0.75.

Concurrency:

Do not use HashMap in multi-threaded environment. What if 2 threads starts resizing the hashMap at the same time?
ConcurrentHashMap is used for multi-threaded environment.

Map of maps:

HashMap of hashMaps are very popular.

Ex:
Map<String, Map<String, Object>> multiHashMap = new HashMap<>();
Map<String, Object> myHashMap1;
Map<String, Object> myHashMap2;
multiHashMap.put(“one”, myHashMap1);
multiHashMap.put(“two”, myHashMap2);

Some specialized hashMaps for specific purposes:

ConcurrentHashMap: HashMap to be used in multithreaded applications.
SynchronizedHashMap: HashMap that is synchronized.
EnumMap: HashMap with Enum values as keys.
LinkedHashMap: HashMap with predictable iteration order (great for FIFO/LIFO caches)
TreeMap

Comments

Popular posts from this blog

Bash - Execute Pl/Sql script from Shell script

How to get client Ip Address using Java HttpServletRequest

How to install Portable JDK in Windows without Admin rights