Tuesday, December 9, 2008

Consistency

In a recent Java refactoring project I was updating some threading code and came across a HashMap that needed to be converted to a ConcurrentHashMap. I made a one line change from:

Map myMap = new HashMap();

to

Map myMap = new ConcurrentHashMap();


It seemed like a good idea before I started working other parts of the threading problem to run my my unit tests just to see what changing the implementation would do. What happened? Good old Java Null Pointer exception. The problem it turned out was due to the get() method. get() was being called with some null key and ConncurretHashMap was throwing a NPE. Why didn't this happen with HashMap? I thought perhaps I should double check the Javadoc for ConcurrentHashMap just in case I made a silly assumption they both had the same interfaces and contracts. Here is the Javadoc excerpt from the description in the first paragraph of the ConcurrentHashMap page.
A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
They say (ignoring gritty details of threading) that these two classes are interchangeable. The problem is that the devil is in the details and the details are the way Java handles uncaught exceptions. They both extend the same classes and have the same interface methods but as it turns out, HashMap and ConcurrentHashMap treat a null key totally different. In order to enforce these classes being interchangeable they couldn't have ConcurrentHashMap.get() actually throw an exception. Going into a little more detail, here are the actual Javadoc comments on the get() method calls:

HashMap.get()
Returns:
the value to which this map maps the specified key, or null if the map contains no mapping for this key.
ConcurrentHashMap.get()
Returns:
the value to which the key is mapped in this table; null if the key is not mapped to any value in this table.
Throws:
NullPointerException - if the key is null.

All of the sudden someone decided they would throw an uncaught exception? Certainly, there must be a reason behind it. I decided to look at the Java source code:

ConcurrentHashMap:

public V get(Object key) {

int hash = hash(key); // throws NullPointerException if key null

...


HashMap:

public V get(Object key) {

if (key == null)

return getForNullKey();

int hash = hash(key.hashCode());

....

It turns out it comes down to a small null check in one method and the same check missing in the other. The Sun Java developers might have had a good reason for this, but it misses the point. With OOP there is nothing more important then truth in advertising. When you claim one thing and do another that is far more dangerous then simply saying "Use at your own risk".

I find this is common in a large number of APIs and it really seems to show it's ugly head when those API's update. Sometimes this is because the developer updating the API makes some bad assumption about how people use it, sometimes its a technical restriction and sometimes it's just a mistake. What is really scary about this one is that it is a runtime exception. That evil type of exception that normally sneaks through (even with good unit tests) and then hits you in production. Users beware!

2 comments:

Neil Coffey said...

Yes, this is definitely irritating. In general with the collections classes, whether or not they allow null is subject to individual implementations, and does break somewhat with the notion of "always referring to objects by their interface".

In this case, I'd say the broken design was allowing null keys in the regular HashMap in the first place -- when do you really need that functionality...?

Scanningcrew said...

I absolutely agree with you Neil. I would say ConcurrentHashMap has the proper functionality. In general, I believe the collection classes (minus the Stack extends Vector choice) are a great set of abstract data types and a very useful set of implementation classes. It's always interesting to find strange behavior like this.