JEP 112: Charset Implementation Improvements
Summary
Improve the maintainability and performance of the standard and extended charset implementations.
Motivation
-
Decrease the size of installed charsets
-
Reduce maintenance cost by generating charset implementations at build time from simple text-based mapping tables
-
Improve the performance of encoding/decoding
Description
This is the second part of the sun.nio.cs/ext re-implementation project. In JDK 7 most of the charsets (80%+) were re-implemented to achieve better maintainability and performance. This JEP continues that work to:
-
Re-implement the remaining charsets, mainly the JIS_X_0208/0212 based Japanese charsets and couple of IBM double-byte charsets such as IBM964 and IBM33722.
-
Implement the sun.nio.cs.ArrayDecoder/Encoder API for the most frequently used double-byte charsets to enhance
new String(byte[])
andString.getBytes()
performance. -
Improve the start-up/access performance of the standard and extended charsets providers.
Testing
Need to ensure that the new implementations are completely compatible (for each and every code point) with the existing implementation. Will write new automatic unit tests running under current test framework to guarantee correctness.