Atom Feed SITE FEED   ADD TO GOOGLE READER

Bug pattern: multiple ways to represent the same data

There's a class of bugs that come up when one logical datatypes has representations in multiple classes. The best example of this is 1 vs. 1L. Both ones represent the same data. But new Integer(1) is not equal to new Long(1) according to the corresponding equals() methods.

Calling contains(1) on a List<Long> compiles just fine, it just won't ever return true. Similarly for Map.get() and Set.contains(). Anything that depends on equals() is broken if you mix different types to express 'one'.

The problem is that each defines an equals method that is local to its class. This is a fair design - but as a consequence these types should not be mixed.

A short catalog of tricky types


...that can cause you pain if you mix them. These types can all represent the same logical value. But if you mix them, you will certainly get burned:
  • "0" : Byte, Short, Integer, Long
  • "0.0" : Float, Double
  • "Jan 1, 1970 12:00.00am UTC" : Date, long, Calendar
  • "http://publicobject.com" : URI, URL
  • "integer type" : int.class, Integer.class
  • "ABC" : StringBuffer, StringBuilder, CharSequence, String
  • "natural order" : Comparators.naturalOrder(), null
  • "String[].class" : GenericArrayType, Class (both of which implement Type)


A shorter catalog of good types


Fortunately, in a few places the JDK has interfaces that dictate how equals and hashCode must be implemented. As a consequence, you can freely intermix these types without consequence:
  • Sets: HashSet, LinkedHashSet, TreeSet
  • Maps: ConcurrentHashMap, HashMap, Collections.emptyMap()
  • Lists: ArrayList, LinkedList, Vector, Arrays.asList

Defining this behaviour for interfaces is somewhat difficult - use these classes as a guide. All implementations must implement the spec exactly or behaviour will be unreliable.

Recommendations


Avoid creating classes that allow one logical datatype to be represented by different classes. If you must, consider writing an interface to specify equals and hashCode at that level.

Choose a preferred, canonical form for your data. For example, if you consider 'null' and 'empty string' to be equal, choose one form and stick to it. Throw IllegalArgumentExceptions to callers that use the wrong one. If you're using collections, always use the canonical type for inserts and lookups.

Use an smart IDE like IntelliJ IDEA. It'll warn you when you mix types.

An Aside...


It turns out that Guice 1.0 suffered an ugly bug because of this problem. You can represent arrays in two different ways using Java 5's Types API. Either as an instance of Class or as an instance of GenericArrayType. The two are equivalent but not equals(). As a consequence, some injections would incorrectly fail with 'missing bindings' exceptions.