Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 96811 invoked from network); 16 Jul 2009 07:39:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Jul 2009 07:39:29 -0000 Received: (qmail 90921 invoked by uid 500); 16 Jul 2009 07:40:34 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 90846 invoked by uid 500); 16 Jul 2009 07:40:34 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 90833 invoked by uid 99); 16 Jul 2009 07:40:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2009 07:40:33 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of littlee1032@gmail.com designates 209.85.132.250 as permitted sender) Received: from [209.85.132.250] (HELO an-out-0708.google.com) (209.85.132.250) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2009 07:40:23 +0000 Received: by an-out-0708.google.com with SMTP id c38so1915615ana.0 for ; Thu, 16 Jul 2009 00:40:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=OfvylOoFqYa7eScdTfJParNNbsI4BDzuJXhux/nsOoU=; b=c2ztwzziXkhL9uXgN4y/GaTrMOoyCJCerQMttsZKb9OYR9JjCROnKljF7rFdR9e3bq d79dx9/EDE73/1Gj+KEqEemc2otzGpv4CjAw6Ki35KfiA8l6LLpK2qe6JyKsD2GCR7/0 a6OUszPU/iSfGqrQnyjTdr3hz9lD30Wc1NwuI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=T5T+kbwEhkAUyDAeGuVqkmFJoGSCcvhP3UqtU5vFGOqfnHrIgJpr4xrzTIB265S7oo nP1Q9M0q/cFqlmkLhcObdTjUDT4GxDkTUg+Css16PlZe1Q49ilOK83RKedAI7tH9OlRt mTjdDMERUCvzT7ccbXcX5ujMFgQuY5kjj5kbU= MIME-Version: 1.0 Received: by 10.100.47.10 with SMTP id u10mr11570839anu.17.1247730002571; Thu, 16 Jul 2009 00:40:02 -0700 (PDT) In-Reply-To: References: <5948b71e0907140250h2ba70787mec98fb2295baa5eb@mail.gmail.com> <5948b71e0907140839k58a391fan54f4a477de1bca9c@mail.gmail.com> <4A5D381D.3090709@gmail.com> <70c713190907141928p1e20ca8dt9ffb05b7ab7bca88@mail.gmail.com> <4A5D684F.6010401@gmail.com> <3b3f27c60907142243r2df33a30m37cc82ebc66f03ac@mail.gmail.com> <4A5D6FF8.6080004@gmail.com> <5948b71e0907142312x446dc760of97276a3c69d2e97@mail.gmail.com> <5948b71e0907152328y431556b1i153e9bce355a2d33@mail.gmail.com> Date: Thu, 16 Jul 2009 15:40:02 +0800 Message-ID: <5948b71e0907160040i55223295h73616a4f6adc7b9@mail.gmail.com> Subject: Re: Shall we change our file.encoding From: Charles Lee To: dev@harmony.apache.org Content-Type: multipart/alternative; boundary=0016e642de069433ef046ecdc75e X-Virus-Checked: Checked by ClamAV on apache.org --0016e642de069433ef046ecdc75e Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks Alexey, I can move the codes to luniglob.c. It's not the big problem to me. How to get the charset on windows is my main point. Any idea about it? On Thu, Jul 16, 2009 at 3:27 PM, Alexey Varlamov < alexey.v.varlamov@gmail.com> wrote: > The main point of the HARMONY-3736 was: why any VM should care about > classlib-specific properties? Let classlib do it, not DRLVM. > > Regards, > Alexey > > 2009/7/16, Charles Lee : > > Hi guys, > > > > I have add the locale function in the drlvm, the patch is attached. > Please > > try this new patch on the linux. > > > > The patch should work on the linux but fail on the windows. Because > windows > > returns code page not charset from the setlocale. I hv tried long time to > > get the charset name from the codepage, for example: > > CPINFOEX cpInfoEx; > > BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx); > > if (iReturn > 0) { > > printf("FULL NAME %s\n", cPinfoEx,CodePageName); > > } > > But I only get the full name without any format. > > > > There is code page identifiers map in the msdn, detail here. I may hard > code > > this map in the file. But the note on the msdn says: > > "ANSI code pages can be different on different computers, or can be > > changed for a single computer, leading to data corruption. For the most > > consistent results, applications should use Unicode, such as UTF-8 or > > UTF-16, instead of a specific code page." > > I am afraid hard-code will fail on some machines. (By the way, this seems > > the UTF-8 is suggested to be the default again :-) > > > > There is also a class Encoding in the VC++, detail here. But we can not > use > > it here. > > > > So anyone knows some thing about locale on the windows? > > Again, shall use UTF-8 as our default? > > > > > > On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee > wrote: > > > That seems we should add it in the drlvm. > > > > > > > > > > > > > > > > > > On Wed, Jul 15, 2009 at 1:58 PM, Regis wrote: > > > > > > > > > > > Nathan Beyer wrote: > > > > > > > > > Is the IBM VME dealing with this correctly? Do we just need to fix > > DRLVM? > > > > > > > > > > > > > Yes, I only tested on Linux, IBM VME set the property correctly. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 15, 2009 at 12:25 AM, Regis wrote: > > > > > > > > > > > Kevin Zhou wrote: > > > > > > > > > > > > > Yea, from luniglob.c, CL attempts to read the "file.encoding" > > property > > > > > > > adown > > > > > > > VM but fails to get the correct encoding. > > > > > > > > > > > > > > Regis, do you know any other specific ways that CL can gain the > > right > > > > > > > property? > > > > > > > > > > > > > We can get from OS directly. Maybe just read env variables on > Linux? > > > > > > > > > > > > > > > > > > > Wed, Jul 15, 2009 at 9:59 AM, Regis > wrote: > > > > > > > > > > > > > > > > > > > > > > Charles Lee wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi Nanthan, > > > > > > > > > > > > > > > > > > If the file encoding derive from the OS, it should be the > some > > bugs in > > > > > > > > > it > > > > > > > > > because on my LINUX machine the locale is en_US.UTF-8. Our > > default codec > > > > > > > > > is > > > > > > > > > still ISO8859-1. Do you know where can we found such codes? > > > > > > > > > > > > > > > > > > > > > > > > > > Classlib expected vm do this and set the property, but it > > didn't, so we > > > > > > > > have to do this by ourselves. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer > > wrote: > > > > > > > > > > > > > > > > > > Are we talking about windows or linux?the default file > > encoding should > > > > > > > > > > > > > > > > > > > derive from the OS. I believe that's defined by the > specs. > > > > > > > > > > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jul 14, 2009, at 5:51 AM, Charles Lee > > wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Charles, I believe UTF-8 is the default encoding for > > RI, and it > > > > > > > > > > > > sounds > > > > > > > > > > > > reasonable. > > > > > > > > > > > > BTW, it may encounter some compatibility problem, > maybe > > we need to > > > > > > > > > > > > run > > > > > > > > > > > > more tests to verify? > > > > > > > > > > > > > > > > > > > > > > > > 2009/7/14 Charles Lee > > > > > > > > > > > > > > > > > > > > > > > > Hi guys: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am doing some test cases on the ant junit test > case > > and meeting > > > > > > > > > > > > > some > > > > > > > > > > > > > encoding problems. I find they are maybe caused by > the > > different > > > > > > > > > > > > > default > > > > > > > > > > > > > encoding from RI and harmony. My local is > en_US.UTF-8, > > RI default is > > > > > > > > > > > > > > > > > > > > > > > > > > UTF-8 > > > > > > > > > > > > > > > > > > > > > > > > > but harmony is 8859-1. And then I have encountered > > > > > > > > > > > > > > > > > > > > > > > > > > > HARMONY-3736, > > > > > > > > > > > > > and the two diffs attached on that issue. It seems > we > > always get > > > > > > > > > > > > > 8859-1. > > > > > > > > > > > > > Because: (correct me if wrong :-) > > > > > > > > > > > > > > > > > > > > > > > > > > 1. we remove the set code in the vm. we will always > > get null if we > > > > > > > > > > > > > call > > > > > > > > > > > > > > > > > > > > > > > > > > vm > > > > > > > > > > > > > > > > > > > > > > > > > method > > > > > > > > > > > > > > > > > > > > > > > > > 2. we set the file.encode in the libglob.c, if we > got > > null from vm, > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > > > > > > > set > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, it should be luniglob.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 8859-1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. we can not set file.encode on the run time. > > > > > > > > > > > > > > > > > > > > > > > > > > ant use UTF-8 to encode filename which contains the > > non-ascii > > > > > > > > > > > > > character. > > > > > > > > > > > > > So why we use iso8859-1 as our unchangeable > default? > > > > > > > > > > > > > From the wiki > > http://en.wikipedia.org/wiki/ISO8859-1, it says "In > > > > > > > > > > > > > computing > > > > > > > > > > > > > applications, encodings that provide full UCS > support > > (such as > > > > > > > > > > > > > > > UTF-8and > > > > > > > > > > > > > UTF-16 > > ) are finding > > increasing > > > > > > > > > > > > > > > > > > > > > > > > > > favor > > > > > > > > > > > > > > > > > > > > > > > > > over encodings based on ISO 8859-1." Should we > simply > > change > > > > > > > > > > > > iso8859-1 > > > > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > utf-8? > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Yours sincerely, > > > > > > > > > > > > > Charles Lee > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > Best Regards! > > > > > > > > > > > > > > > > > > > > > > > > Jimmy, Jing Lv > > > > > > > > > > > > China Software Development Lab, IBM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Yours sincerely, > > > > > > > > > > > Charles Lee > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best Regards, > > > > > > > > Regis. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best Regards, > > > > > > Regis. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best Regards, > > > > Regis. > > > > > > > > > > > > > > > > -- > > > Yours sincerely, > > > Charles Lee > > > > > > > > > > > > > > -- > > Yours sincerely, > > Charles Lee > > > > > > > -- Yours sincerely, Charles Lee --0016e642de069433ef046ecdc75e--