The XML Editor appears to read UTF-8 encoded files correctly only for some well-knows file types like ‘*.xml’ or ‘*.xsl’ and fails for instance with ‘*.xsp’ files containing non-ASCII characters.
Example:
Use the XML Editor to create a small xml file, say ‘x1.xml’. Make sure it starts with a proper XML header line stating the encoding as UTF-8. Let it contain some non-ASCII characters like german umlauts (ä etc.). the XML Editor loads ans saves this file all right, and it’s eclipse properties show the text encoding as ‘Default (determined from content: UTF-8)’.
Next, copy and rename this file, e.g. to ‘x1.xsp’. Set the eclipse encoding manually to UTF-8. The eclipse text editor then loads and saves both files correctly.
The XML Editor, however, appears to LOAD the xsp file with the default platform encoding, resulting in multiple ISO characters for each former non-ASCII char, but SAVEs it in UTF-8 encoding, which in turn leads to even more garbage characters after subsequent save/load cycles.
The german ä (with ISO code 0xE4) is represented in UTF-8 as two bytes 0xC3 0xA4. After one load/save cycle, the XML Editor has transformed them into four bytes, 0xC3 0x83 0xC2 0xA4, after two cyles into 0xC3 0x83 0xC6 0x92 0xC3 0x82 0xC2 0xA4…