httpd-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yoshiki Hayashi <>
Subject Re: Shift_Jis for generated japanese output?
Date Tue, 23 Mar 2004 07:01:32 GMT
André Malo <> writes:

>> Not really.  Iso-2022-jp is the most auto-detection friendly
>> encoding because of infamous escape sequence but shift_jis
>> would be OK, too.  I'm +-0 on conversion at the moment
>> because it hasn't caused me much trouble so far.
> Well, then we should stick with it.
> I find it just annoying that the recoding is not stable (i.e. the xalan
> serializer output differs from version to version and depending on other
> things like moon phases or so :).

From my experience, it was Java version that mattered.
After I upgraded to JDK 1.4, I don't have problem with
encoding.  The reason this happens is that iso-2022-jp is a
stateful encoding.  After an escape sequence, following
bytes are interpreted in certain state.  After one escape
sequence, bytes are interpreted as ASCII character and after
another, those are are interpreted as some Japanese
characters.  Because of this, you can have bogus escape
sequences like switching to another state and then
immediately going back to previous state.  There were lots
of these sequences when I was using JDK 1.2.  I'm hoping
this won't happen anymore since all files are re-encoded
with newer JDK.  If this happens again, I would give +1 for
changing the generated files to shift_jis.

> (though... actually I don't know if shift_jis would make the things better)

Yes, shift_jis would make it easier because it's a plain
8bit character encoding scheme.  There is only one way to
encode a character.

Yoshiki Hayashi

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message