Return-Path: Delivered-To: apmail-httpd-docs-archive@www.apache.org Received: (qmail 33124 invoked from network); 27 Jul 2004 20:04:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 27 Jul 2004 20:04:15 -0000 Received: (qmail 84838 invoked by uid 500); 27 Jul 2004 20:04:14 -0000 Delivered-To: apmail-httpd-docs-archive@httpd.apache.org Received: (qmail 84788 invoked by uid 500); 27 Jul 2004 20:04:13 -0000 Mailing-List: contact docs-help@httpd.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: docs@httpd.apache.org Delivered-To: mailing list docs@httpd.apache.org Delivered-To: moderator for docs@httpd.apache.org Received: (qmail 25001 invoked by uid 99); 27 Jul 2004 18:44:58 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Message-Id: <200407271844.i6RIie1t023648@jc-smtp.iij.ad.jp> Date: Wed, 28 Jul 2004 03:44:37 +0900 From: Hiroaki KAWAI X-Mailer: EdMax Ver2.85.5F MIME-Version: 1.0 To: docs@httpd.apache.org Subject: Re: Japanese transformation is not stable Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit In-Reply-To: <87fz7dga82.fsf@sodan.org> References: <87fz7dga82.fsf@sodan.org> X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N > > Hmm. It still happens, that different JREs (?) produce different iso-2022-jp > > output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs. > > Well, at least mine removes bogus escape sequences and > produce more desirable output but yeah, it still happens. Last few months, I've encountered some bugs of the implementation of iso-2022-jp charset converter of Sun JRE, but the converter will be soon stable I think. I'm working on the input XML files, and I'm not watching the generated html files. I feel the diffs of the htmls are not so important than those of the xmls. I can just ignore the diffs of generated html files right now. Well, I don't understand what the diffs do harm to us, so can I ask some reasons? > > I'd suggest to switch the transformation finally to shift_jis, which is more > > stable (because there are none of these problematic escape sequences). > > I'd rather use euc-jp than shift_jis. For one thing, > shift_jis is a nightmare for auto detection since almost all > byte sequence can represent a valid character. If I choose > from three major character encoding scheme in Japan, I > always choose euc-jp. It doesn't have quirks sjis has. The > fact that current one uses iso-2022-jp is just from legacy > reasons. IMHO, whatever charset we choose, more or less, we will face this kind of problem. # I, myself prefer UTF8. :-) ## Because it support wide area of characters. But, shift_jis is actually worse choise because there're well known issuses around Shift_JIS and CP932 charsets. The alias definition changed and changed between the release of Java. I have no strong push which charset to be (except shift_jis). ---Hiroaki Kawai --------------------------------------------------------------------- To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org For additional commands, e-mail: docs-help@httpd.apache.org