Return-Path: Delivered-To: apmail-httpd-docs-archive@www.apache.org Received: (qmail 41013 invoked from network); 27 Jul 2004 20:18:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 27 Jul 2004 20:18:44 -0000 Received: (qmail 11644 invoked by uid 500); 27 Jul 2004 20:18:43 -0000 Delivered-To: apmail-httpd-docs-archive@httpd.apache.org Received: (qmail 11426 invoked by uid 500); 27 Jul 2004 20:18:42 -0000 Mailing-List: contact docs-help@httpd.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: docs@httpd.apache.org Delivered-To: mailing list docs@httpd.apache.org Received: (qmail 11415 invoked by uid 99); 27 Jul 2004 20:18:42 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Date: Tue, 27 Jul 2004 22:18:40 +0200 From: =?ISO-8859-15?Q?Andr=E9?= Malo To: docs@httpd.apache.org Subject: Re: Japanese transformation is not stable Message-Id: <20040727221840.3a51b0b1@parker> In-Reply-To: <200407271844.i6RIie1t023648@jc-smtp.iij.ad.jp> References: <87fz7dga82.fsf@sodan.org> <200407271844.i6RIie1t023648@jc-smtp.iij.ad.jp> Organization: TIMTOWTDI X-Mailer: Yes! Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N * Hiroaki KAWAI wrote: > > > Hmm. It still happens, that different JREs (?) produce different > > > iso-2022-jp output (i.e. any time someone builds all and diffs, he ge= ts > > > .ja.jis diffs. > >=20 > > Well, at least mine removes bogus escape sequences and > > produce more desirable output but yeah, it still happens. >=20 > Last few months, I've encountered some bugs of the implementation of=20 > iso-2022-jp charset converter of Sun JRE, but the converter will be soon= =20 > stable I think.=20 > I'm working on the input XML files, and I'm not watching the generated=20 > html files. I feel the diffs of the htmls are not so important than=20 > those of the xmls.=20 > I can just ignore the diffs of generated html files right now. >=20 > Well, I don't understand what the diffs do harm to us, so can I ask some= =20 > reasons? The problem is, that someone who builds the whole tree gets japanese diffs - and most people just cannot decide if they made somehting wrong or not (I c= an, because I've glanced over the accompanying RFC ;-) The second, though not so important reason is, that I'm currently working on restructuring the docs to create a better platform for translators which includes rewriting the styles and the build tools. I'm using the result dif= fs to check if something went wrong. > > > I'd suggest to switch the transformation finally to shift_jis, which = is > > > more stable (because there are none of these problematic escape > > > sequences). > >=20 > > I'd rather use euc-jp than shift_jis. For one thing, > > shift_jis is a nightmare for auto detection since almost all > > byte sequence can represent a valid character. If I choose > > from three major character encoding scheme in Japan, I > > always choose euc-jp. It doesn't have quirks sjis has. The > > fact that current one uses iso-2022-jp is just from legacy > > reasons. >=20 > IMHO, whatever charset we choose, more or less, we will face this kind of= =20 > problem.=20 > # I, myself prefer UTF8. :-) > ## Because it support wide area of characters.=20 UTF-8 is cool, but too large for the resulting html pages. A two-byte encod= ing is way smaller and the wider area of characters one needs (if any) are supported by html itself (&#xxx;). > But, shift_jis is actually worse choise because there're well known=20 > issuses around Shift_JIS and CP932 charsets.=20 > The alias definition changed and changed between the release of Java. Ok. That's reason why I've asked. I've had shift_jis in my mind, since we're currently recoding to shift_jis for the CHM files, because the html help compiler seems to support only this charset for Japanese. If euc-jp is bett= er for the online pages, we should use it. If noone objects, I'm going to start conversion to euc-jp within some days. Just to make clear: that this doesn't affect the *source* encoding. Keep it= as you like. nd --=20 Flhacs wird im Usenet grunds=E4tzlich alsfhc geschrieben. Schreibt man lafhsc nicht slfach, so ist das schlichtweg hclafs. Hingegen darf man rihctig ruhig rhitcgi schreiben, weil eine shcalfe Schreibweise bei irhictg nicht als shflac angesehen wird. -- Hajo Pfl=FCger in dnq --------------------------------------------------------------------- To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org For additional commands, e-mail: docs-help@httpd.apache.org