PUBLIC OBJECT

Why there's no String.getCharset()

With Java's String class, there's a 2 arg constructor that takes bytes and charsetName:

byte[] characters = { 83, 87, 65, 78, 75, 46, 67, 65 };
String myString = new [String(characters, "ISO 8859-1")](http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#String(byte[],%20java.lang.String));`</pre>
Symmetrically, you might expect this:
<pre>`byte[] theBytes = myString.[getBytes()](http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#getBytes());
String theCharset = myString.getCharsetName()`</pre>
The second line doesn't compile because **there's no _getCharsetName()_ method** on String.

Why? Internally, Strings are _always_ UTF-16, regardless of what charset was used to create them. The String(byte[],charset) constructor converts the bytes into UTF-16 characters using the charset as a guide.

This turns out to be very handy:
<li>Since all Strings use the same charset, there's no need to convert charsets when doing `compareTo()`, `indexOf()` or `equals()`.
</li><li>Once you have a String, you don't need to think about its character set! Charsets and encodings only matter when you're converting between byte[]s and Strings.

Unfortunately, [some code](http://svn.sourceforge.net/viewcvs.cgi/jwebunit/trunk/jwebunit-core/src/main/java/net/sourceforge/jwebunit/TestContext.java?view=markup&amp;rev=448) in an otherwise awesome [project](http://jwebunit.sourceforge.net/) that misunderstands this concept has caused me some grief! Hopefully everything will be resolved soon.

One cause of this problem is that Java developers have been trained to expect that constructor arguments  will be used to initialize an object's properties. Perhaps instead of a constructor, Java's designers could have used a simple factory method to make the decoding action more explicit:
<pre>`public static String decodeBytes(byte[], charset)

For a great overview of why character sets are the way they are, check out Joel Spolsky's article.