SimpleDateFormat considered harmful

There's been some discussion on the lists at work about SimpleDateFormat. Apparently it's not safe to be used by multiple threads simultaneously, despite the fact that the format() method doesn't appear to mutate the state of the object. It's a huge design flaw that has bitten many a developer in the ass.

The discussion turned to possible workarounds and speculation on their comparative performance. Here's the contenders and my speculation:
  • create a new SimpleDateFormat instance on every invocation. I predicted that this approach would be slow because of the computational expense of parsing the format string.
  • synchronize() on a single shared SimpleDateFormat instance. I thought this approach would be fastest.
  • use some sort of simple pooling mechanism to share multiple SimpleDateFormats between the threads. I figured this approach would be slow because there would be a need for two synchronized() blocks: one for fetching objects from the pool and another for returning them
  • Use ThreadLocal, where each Thread has a private SimpleDateFormat instance. I figured this approach would be slow because ThreadLocal probably had some complexity in its implementation.

Well having placed my bets on the synchronized shared instance, I decided to write a bit of code and benchmark the four options. Here's the results, as obtained unscientifically on my Mac Book Pro:

ImplementationRuntime (seconds)
new instance each time9.6
single synchronized instance3.4
simple pool2.5

The results are surprising - ThreadLocal is the clear winner, 50% faster than synchronized(). But don't take my word for it. Critique and tweak and run your own tests.
I figured I'd just show you the performance characteristics on a WinXP JVM (with an older intel processor)

new instance each time: 17.5s
single synchronized instance: 5.0s
simple pool: 4.0s
ThreadLocal: 3.5s
I was surprised to see that the ThreadLocal approach even outperformed Jakarta Commons Lang's FastDateFormat.

With 100 threads and 10000 rounds i got the following results:

new instance each time: 103.6s
single synchronized instance: 31.3s
simple pool: 21.2s
ThreadLocal: 21.0s
FastDateFormat: 22.1s