From "Andrew McIntyre" <mcintyr...@gmail.com>
Subject Re: Re: translation checker...
Date Thu, 02 Nov 2006 19:00:44 GMT
On 11/2/06, Daniel John Debrunner <djd@apache.org> wrote:
> Hmmm, the documentation for native2ascii does not agree with the
> statement about that characters in the range 128-255 range are converted
>   into Unicode Escapes. It says non-Latin 1 characters are converted,
> where Latin-1 is the common name for ISO8859-1.

Then the native2ascii documentation doesn't agree with what
native2ascii actually does. :-)

A quick scan through Derby's translated message files, converted by me
from various encodings using native2ascii, shows that all the
characters above 128 have been converted to Unicode Escapes. Grep for
\\u00[bcdef] in the directories with translated properties files to
see examples.

Also, I do have vague years-old memories of doing testing of
translated properties files and discovering that characters in the
upper half of the ISO-8859-1 character set, while read properly from
the properties file, were not displayed properly when output to the
console. These sorts of problems might be fixed now, might not,
probably depends on your JVM. Since I've only ever tested with ASCII
properties files since then, I wouldn't know for sure. :-)

Anyway, I think what we really want to catch are files that haven't
been run through native2ascii and are in some encoding that definitely
won't work, like UTF-8 or SJIS. Bytes in the file with a value > 127
are one sign that that might be the case. There's probably a better
way to figure out if you have a file not in a non-ASCII or ISO8859
encoding, but it may be more complicated than what we need. I'll go do
some searching around that.


