db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: translation checker...
Date Thu, 02 Nov 2006 19:36:16 GMT
Andrew McIntyre wrote:

> A quick scan through Derby's translated message files, converted by me
> from various encodings using native2ascii, shows that all the
> characters above 128 have been converted to Unicode Escapes. Grep for
> \\u00[bcdef] in the directories with translated properties files to
> see examples.

Looking more I now see that properties file format is more that just 
ISO8859-1 encoding with unicode escapes. The javadoc for 
Properties.store states much more about which characters are escaped 
including that:

"Characters less than \u0020 and characters greater than \u007E are 
written as \uxxxx for the appropriate hexadecimal value xxxx. "
This matches what Andrew sees in the Derby files.

So any checks should be driven off that description only, and 
native2ascii and the JLS have no relevance.

So checking for non ASCII byte values in the raw stream is the right 
general idea, but the details need to be more specific, e.g. I think any 
characters in the range 0x00-0x1f (which are ASCII) and 0x7f-0xff are 
invalid, and there may be others.

Thanks,
Dan.




Mime
View raw message