From dev-return-37708-apmail-harmony-dev-archive=harmony.apache.org@harmony.apache.org Fri Jul 17 01:25:49 2009 Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 58721 invoked from network); 17 Jul 2009 01:25:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Jul 2009 01:25:49 -0000 Received: (qmail 79712 invoked by uid 500); 17 Jul 2009 01:26:54 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 79623 invoked by uid 500); 17 Jul 2009 01:26:54 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 79612 invoked by uid 99); 17 Jul 2009 01:26:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 01:26:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nbeyer@gmail.com designates 209.85.217.227 as permitted sender) Received: from [209.85.217.227] (HELO mail-gx0-f227.google.com) (209.85.217.227) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 01:26:43 +0000 Received: by gxk27 with SMTP id 27so849905gxk.12 for ; Thu, 16 Jul 2009 18:26:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=+3+xNRpiFu4VJBW6LoDUCYED8B6dbCnZMjPgNf0yoVw=; b=l56ISzQoiTF0vBvyS/sVGncGJToPrceGQnlj2nm5B7Lpn3DkY05envWx2AIwqIPHD9 aF5XOKElulpMaw010c4bnADIQZdD6jtS/aVUZXbuZ+CpPAYEyF7vyOgnLtIMMGRpznia xoYu6KY762KKW6BDjz9l82HvS8XG+z157yXmU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=QfUvacc/0VdUDxbvIitpbomPXA2asqDYyxYk+QQXv8Nci3Lo82RyNbvHwKlfiMaH/J lIK1IQIVz1zKHzjrBPVA8dW1OhdT0OIOUgqYfpJzPX5zAfv5B2DGC46bBdeUu0Yz7X71 eZJNJPgbL3Ju1kcm+utB70mcJOvsiSETSZYvE= MIME-Version: 1.0 Sender: nbeyer@gmail.com Received: by 10.151.85.18 with SMTP id n18mr926177ybl.35.1247793981676; Thu, 16 Jul 2009 18:26:21 -0700 (PDT) In-Reply-To: <5948b71e0907161818s4937a9e9g7bcba39e959c2d9d@mail.gmail.com> References: <5948b71e0907140250h2ba70787mec98fb2295baa5eb@mail.gmail.com> <4A5D381D.3090709@gmail.com> <70c713190907141928p1e20ca8dt9ffb05b7ab7bca88@mail.gmail.com> <4A5D684F.6010401@gmail.com> <3b3f27c60907142243r2df33a30m37cc82ebc66f03ac@mail.gmail.com> <4A5D6FF8.6080004@gmail.com> <5948b71e0907142312x446dc760of97276a3c69d2e97@mail.gmail.com> <5948b71e0907152328y431556b1i153e9bce355a2d33@mail.gmail.com> <3b3f27c60907161805o679d5197w1bebb4d5b42060fd@mail.gmail.com> <5948b71e0907161818s4937a9e9g7bcba39e959c2d9d@mail.gmail.com> Date: Thu, 16 Jul 2009 20:26:21 -0500 X-Google-Sender-Auth: cb0938092d54a564 Message-ID: <3b3f27c60907161826j1fe179eaq25d50c29cbe80332@mail.gmail.com> Subject: Re: Shall we change our file.encoding From: Nathan Beyer To: dev@harmony.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jul 16, 2009 at 8:18 PM, Charles Lee wrote: > Hi Nathan, > > What I got is 936, the code page identifier. Is there a api for us to map > 936 to the gb2312? Oh, the 'identifier' bit was missing - yeah, we'll need to translate that into a name of some sort. I'll poke around a bit and see what I can find. > If we put 936 in the file.encoding, can we successfully get the encoder a= nd > decoder by charset? > > On Fri, Jul 17, 2009 at 9:05 AM, Nathan Beyer wrote: > >> On Thu, Jul 16, 2009 at 1:28 AM, Charles Lee wrot= e: >> > Hi guys, >> > >> > I have add the locale function in the drlvm, the patch is attached. >> Please >> > try this new patch on the linux. >> > >> > The patch should work on the linux but fail on the windows. Because >> windows >> > returns code page not charset from the setlocale. >> >> Code page and character set are the same thing. We shouldn't need to >> convert it as the Charset APIs will have to support the values anyway. >> >> What's the value you're getting? If it's 'Cp1252', then we're good, as >> that's just an alias for 'Windows-1252' (or vice-versa). >> >> -Nathan >> >> >> > I hv tried long time to >> > get the charset name from the codepage, for example: >> > CPINFOEX cpInfoEx; >> > BOOL iReturn =3D GetCPInfoEx(CP_ACP,0, &cPInfoEx); >> > if (iReturn > 0) { >> > =C2=A0 =C2=A0 printf("FULL NAME %s\n", cPinfoEx,CodePageName); >> > } >> > But I only get the full name without any format. >> > >> > There is code page identifiers map in the msdn, detail here. I may har= d >> code >> > this map in the file. But the note on the msdn says: >> > =C2=A0 =C2=A0 =C2=A0"ANSI code pages can be different on different com= puters, or can be >> > changed for a single computer, leading to data corruption. For the mos= t >> > consistent results, applications should use Unicode, such as UTF-8 or >> > UTF-16, instead of a specific code page." >> > I am afraid hard-code will fail on some machines. (By the way, this se= ems >> > the UTF-8 is suggested to be the default again :-) >> > >> > There is also a class Encoding in the VC++, detail here. But we can no= t >> use >> > it here. >> > >> > So anyone knows some thing about locale on the windows? >> > Again, shall use UTF-8 as our default? >> > >> > On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee >> wrote: >> >> >> >> That seems we should add it in the drlvm. >> >> >> >> On Wed, Jul 15, 2009 at 1:58 PM, Regis wrote: >> >>> >> >>> Nathan Beyer wrote: >> >>>> >> >>>> Is the IBM VME dealing with this correctly? Do we just need to fix >> >>>> DRLVM? >> >>> >> >>> Yes, I only tested on Linux, IBM VME set the property correctly. >> >>> >> >>>> >> >>>> On Wed, Jul 15, 2009 at 12:25 AM, Regis wrote: >> >>>>> >> >>>>> Kevin Zhou wrote: >> >>>>>> >> >>>>>> Yea, from luniglob.c, CL attempts to read the "file.encoding" >> property >> >>>>>> adown >> >>>>>> VM but fails to get the correct encoding. >> >>>>>> >> >>>>>> Regis, do you know any other specific ways that CL can gain the >> right >> >>>>>> property? >> >>>>> >> >>>>> We can get from OS directly. Maybe just read env variables on Linu= x? >> >>>>> >> >>>>>> Wed, Jul 15, 2009 at 9:59 AM, Regis wrote: >> >>>>>> >> >>>>>>> Charles Lee wrote: >> >>>>>>> >> >>>>>>>> Hi Nanthan, >> >>>>>>>> >> >>>>>>>> If the file encoding derive from the OS, it should be the some >> bugs >> >>>>>>>> in >> >>>>>>>> it >> >>>>>>>> because on my LINUX machine the locale is en_US.UTF-8. Our defa= ult >> >>>>>>>> codec >> >>>>>>>> is >> >>>>>>>> still ISO8859-1. Do you know where can we found such codes? >> >>>>>>>> >> >>>>>>> Classlib expected vm do this and set the property, but it didn't= , >> so >> >>>>>>> we >> >>>>>>> have to do this by ourselves. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> =C2=A0Are we talking about windows or linux?the default file en= coding >> >>>>>>>> should >> >>>>>>>>> >> >>>>>>>>> derive from the OS. I believe that's defined by the specs. >> >>>>>>>>> >> >>>>>>>>> Sent from my iPhone >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Jul 14, 2009, at 5:51 AM, Charles Lee >> >>>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>> =C2=A0On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> =C2=A0Hi, >> >>>>>>>>>> >> >>>>>>>>>>> =C2=A0Charles, I believe UTF-8 is the default encoding for R= I, and >> it >> >>>>>>>>>>> sounds >> >>>>>>>>>>> reasonable. >> >>>>>>>>>>> =C2=A0BTW, it may encounter some compatibility problem, mayb= e we >> need >> >>>>>>>>>>> to >> >>>>>>>>>>> run >> >>>>>>>>>>> more tests to verify? >> >>>>>>>>>>> >> >>>>>>>>>>> 2009/7/14 Charles Lee >> >>>>>>>>>>> >> >>>>>>>>>>> =C2=A0Hi guys: >> >>>>>>>>>>> >> >>>>>>>>>>>> I am doing some test cases on the ant junit test case and >> >>>>>>>>>>>> meeting >> >>>>>>>>>>>> some >> >>>>>>>>>>>> encoding problems. I find they are maybe caused by the >> different >> >>>>>>>>>>>> default >> >>>>>>>>>>>> encoding from RI and harmony. My local is en_US.UTF-8, RI >> >>>>>>>>>>>> default is >> >>>>>>>>>>>> >> >>>>>>>>>>>> =C2=A0UTF-8 >> >>>>>>>>>>> >> >>>>>>>>>>> =C2=A0but harmony is 8859-1. And then I have encountered >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> HARMONY-3736< >> https://issues.apache.org/jira/browse/HARMONY-3736>, >> >>>>>>>>>>>> and the two diffs attached on that issue. It seems we alway= s >> get >> >>>>>>>>>>>> 8859-1. >> >>>>>>>>>>>> Because: (correct me if wrong :-) >> >>>>>>>>>>>> >> >>>>>>>>>>>> 1. we remove the set code in the vm. we will always get nul= l >> if >> >>>>>>>>>>>> we >> >>>>>>>>>>>> call >> >>>>>>>>>>>> >> >>>>>>>>>>>> =C2=A0vm >> >>>>>>>>>>> >> >>>>>>>>>>> =C2=A0method >> >>>>>>>>>>>> >> >>>>>>>>>>>> 2. we set the file.encode in the libglob.c, if we got null >> from >> >>>>>>>>>>>> vm, >> >>>>>>>>>>>> we >> >>>>>>>>>>>> >> >>>>>>>>>>>> =C2=A0set >> >>>>>>>>>>> >> >>>>>>>>>>> =C2=A0Sorry, it should be luniglob.c >> >>>>>>>>>>> >> >>>>>>>>>> =C2=A08859-1. >> >>>>>>>>>>>> >> >>>>>>>>>>>> 3. we can not set file.encode on the run time. >> >>>>>>>>>>>> >> >>>>>>>>>>>> ant use UTF-8 to encode filename which contains the non-asc= ii >> >>>>>>>>>>>> character. >> >>>>>>>>>>>> So why we use iso8859-1 as our unchangeable default? >> >>>>>>>>>>>> From the wiki http://en.wikipedia.org/wiki/ISO8859-1, it sa= ys >> >>>>>>>>>>>> "In >> >>>>>>>>>>>> computing >> >>>>>>>>>>>> applications, encodings that provide full UCS support (such= as >> >>>>>>>>>>>> UTF-8and >> >>>>>>>>>>>> UTF-16 ) are finding >> >>>>>>>>>>>> increasing >> >>>>>>>>>>>> >> >>>>>>>>>>>> =C2=A0favor >> >>>>>>>>>>> >> >>>>>>>>>>> =C2=A0over encodings based on ISO 8859-1." Should we simply = change >> >>>>>>>>>>> iso8859-1 >> >>>>>>>>>>>> >> >>>>>>>>>>>> to >> >>>>>>>>>>>> utf-8? >> >>>>>>>>>>>> >> >>>>>>>>>>>> -- >> >>>>>>>>>>>> Yours sincerely, >> >>>>>>>>>>>> Charles Lee >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> >> >>>>>>>>>>> Best Regards! >> >>>>>>>>>>> >> >>>>>>>>>>> Jimmy, Jing Lv >> >>>>>>>>>>> China Software Development Lab, IBM >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> -- >> >>>>>>>>>> Yours sincerely, >> >>>>>>>>>> Charles Lee >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>> -- >> >>>>>>> Best Regards, >> >>>>>>> Regis. >> >>>>>>> >> >>>>> >> >>>>> -- >> >>>>> Best Regards, >> >>>>> Regis. >> >>>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> Best Regards, >> >>> Regis. >> >> >> >> >> >> >> >> -- >> >> Yours sincerely, >> >> Charles Lee >> >> >> > >> > >> > >> > -- >> > Yours sincerely, >> > Charles Lee >> > >> > >> > > > > -- > Yours sincerely, > Charles Lee >