commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leandro Reis <lr...@adobe.com>
Subject Re: [io] support for additional character sets needed in ReversedLinesFileReader
Date Tue, 03 Mar 2015 00:02:28 GMT
On 2 March 2015 at 21:53, sebb wrote:

>>On 2 March 2015 at 20:00, Leandro Reis <lreis@adobe.com> wrote:
>>Hi all,
>>
>>I¹m working on a product that uses Commons IO via Jackrabbit Oak. In the
>>process of testing the launch of such product on Japanese Windows 2012
>>Server R2, I came across the following exception:
>>"(java.io.UnsupportedEncodingException: Encoding windows-31j is not
>>supported yet (feel free to submit a patch))"
>>
>>windows-31j is the IANA name for Windows code page 932 (Japanese), and is
>>returned by Charset.defaultCharset(), used in
>>org.apache.commons.io.input.ReversedLinesFileReader [0].
>>
>>
>>It looks like this issue could be addressed by adding a check for
>>³windows-31j² to ReversedLinesFileReader(final File file, final int
>>blockSize, final Charset encoding):
>>
>>
>>...
>>} else if(charset.equals(Charset.forName("windows-31j"))) {
>>     byteDecrement = 1;
>>}
>>...
>>
>>Similar changes would be needed in order to support the Chinese
>>Simplified, Chinese Traditional, and Korean versions of the same OS (I¹m
>>checking what the corresponding encoding names are).
>>
>>Can someone familiar with this area of the code confirm this looks like
>>the proper approach to addressing this?

>Can a newline byte ever appear as part of a multi-byte character in any
>of those encodings?
No. Sources:
- Japanese: 
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
- Simplified Chinese:
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
- Korean: 
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT
- Traditional Chinese:
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT


>>Thanks,
>> Leandro
>>
>>[0] 
>>http://svn.apache.org/viewvc/commons/proper/io/trunk/src/main/java/org/ap
>>ache/commons/io/input/ReversedLinesFileReader.java?view=markup



Mime
View raw message