harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Beyer <nbe...@gmail.com>
Subject Re: Shall we change our file.encoding
Date Fri, 17 Jul 2009 01:02:56 GMT
On Thu, Jul 16, 2009 at 2:27 AM, Alexey
Varlamov<alexey.v.varlamov@gmail.com> wrote:
> The main point of the HARMONY-3736 was: why any VM should care about
> classlib-specific properties? Let classlib do it, not DRLVM.

Can you point to some conversation that backs this up? I looked at
that issue and I don't interpret it like you do.

In any case, it looks like this work should be done on this issue,
since it's what we're talking about -
https://issues.apache.org/jira/browse/HARMONY-3829.

-Nathan

>
> Regards,
> Alexey
>
> 2009/7/16, Charles Lee <littlee1032@gmail.com>:
>> Hi guys,
>>
>> I have add the locale function in the drlvm, the patch is attached. Please
>> try this new patch on the linux.
>>
>> The patch should work on the linux but fail on the windows. Because windows
>> returns code page not charset from the setlocale. I hv tried long time to
>> get the charset name from the codepage, for example:
>> CPINFOEX cpInfoEx;
>> BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx);
>> if (iReturn > 0) {
>>     printf("FULL NAME %s\n", cPinfoEx,CodePageName);
>> }
>> But I only get the full name without any format.
>>
>> There is code page identifiers map in the msdn, detail here. I may hard code
>> this map in the file. But the note on the msdn says:
>>      "ANSI code pages can be different on different computers, or can be
>> changed for a single computer, leading to data corruption. For the most
>> consistent results, applications should use Unicode, such as UTF-8 or
>> UTF-16, instead of a specific code page."
>> I am afraid hard-code will fail on some machines. (By the way, this seems
>> the UTF-8 is suggested to be the default again :-)
>>
>> There is also a class Encoding in the VC++, detail here. But we can not use
>> it here.
>>
>> So anyone knows some thing about locale on the windows?
>> Again, shall use UTF-8 as our default?
>>
>>
>> On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee <littlee1032@gmail.com> wrote:
>> > That seems we should add it in the drlvm.
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jul 15, 2009 at 1:58 PM, Regis <xu.regis@gmail.com> wrote:
>> >
>> > >
>> > > Nathan Beyer wrote:
>> > >
>> > > > Is the IBM VME dealing with this correctly? Do we just need to fix
>> DRLVM?
>> > > >
>> > >
>> > > Yes, I only tested on Linux, IBM VME set the property correctly.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > >
>> > > > On Wed, Jul 15, 2009 at 12:25 AM, Regis<xu.regis@gmail.com>
wrote:
>> > > >
>> > > > > Kevin Zhou wrote:
>> > > > >
>> > > > > > Yea, from luniglob.c, CL attempts to read the "file.encoding"
>> property
>> > > > > > adown
>> > > > > > VM but fails to get the correct encoding.
>> > > > > >
>> > > > > > Regis, do you know any other specific ways that CL can gain
the
>> right
>> > > > > > property?
>> > > > > >
>> > > > > We can get from OS directly. Maybe just read env variables on
Linux?
>> > > > >
>> > > > >
>> > > > > > Wed, Jul 15, 2009 at 9:59 AM, Regis <xu.regis@gmail.com>
wrote:
>> > > > > >
>> > > > > >
>> > > > > > > Charles Lee wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > > > Hi Nanthan,
>> > > > > > > >
>> > > > > > > > If the file encoding derive from the OS, it should
be the some
>> bugs in
>> > > > > > > > it
>> > > > > > > > because on my LINUX machine the locale is en_US.UTF-8.
Our
>> default codec
>> > > > > > > > is
>> > > > > > > > still ISO8859-1. Do you know where can we found
such codes?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > Classlib expected vm do this and set the property,
but it
>> didn't, so we
>> > > > > > > have to do this by ourselves.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > > On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer
>> <nbeyer@gmail.com> wrote:
>> > > > > > > >
>> > > > > > > >  Are we talking about windows or linux?the default
file
>> encoding should
>> > > > > > > >
>> > > > > > > > > derive from the OS. I believe that's defined
by the specs.
>> > > > > > > > >
>> > > > > > > > > Sent from my iPhone
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Jul 14, 2009, at 5:51 AM, Charles Lee
>> <littlee1032@gmail.com> wrote:
>> > > > > > > > >
>> > > > > > > > >  On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing
Lv
>> <firepure@gmail.com>
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >  Hi,
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >  Charles, I believe UTF-8 is the
default encoding for
>> RI, and it
>> > > > > > > > > > > sounds
>> > > > > > > > > > > reasonable.
>> > > > > > > > > > >  BTW, it may encounter some compatibility
problem, maybe
>> we need to
>> > > > > > > > > > > run
>> > > > > > > > > > > more tests to verify?
>> > > > > > > > > > >
>> > > > > > > > > > > 2009/7/14 Charles Lee <littlee1032@gmail.com>
>> > > > > > > > > > >
>> > > > > > > > > > >  Hi guys:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > I am doing some test cases
on the ant junit test case
>> and meeting
>> > > > > > > > > > > > some
>> > > > > > > > > > > > encoding problems. I find
they are maybe caused by the
>> different
>> > > > > > > > > > > > default
>> > > > > > > > > > > > encoding from RI and harmony.
My local is en_US.UTF-8,
>> RI default is
>> > > > > > > > > > > >
>> > > > > > > > > > > >  UTF-8
>> > > > > > > > > > > >
>> > > > > > > > > > >  but harmony is 8859-1. And then
I have encountered
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> HARMONY-3736<https://issues.apache.org/jira/browse/HARMONY-3736>,
>> > > > > > > > > > > > and the two diffs attached
on that issue. It seems we
>> always get
>> > > > > > > > > > > > 8859-1.
>> > > > > > > > > > > > Because: (correct me if wrong
:-)
>> > > > > > > > > > > >
>> > > > > > > > > > > > 1. we remove the set code
in the vm. we will always
>> get null if we
>> > > > > > > > > > > > call
>> > > > > > > > > > > >
>> > > > > > > > > > > >  vm
>> > > > > > > > > > > >
>> > > > > > > > > > >  method
>> > > > > > > > > > >
>> > > > > > > > > > > > 2. we set the file.encode
in the libglob.c, if we got
>> null from vm,
>> > > > > > > > > > > > we
>> > > > > > > > > > > >
>> > > > > > > > > > > >  set
>> > > > > > > > > > > >
>> > > > > > > > > > >  Sorry, it should be luniglob.c
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >  8859-1.
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > 3. we can not set file.encode
on the run time.
>> > > > > > > > > > > >
>> > > > > > > > > > > > ant use UTF-8 to encode filename
which contains the
>> non-ascii
>> > > > > > > > > > > > character.
>> > > > > > > > > > > > So why we use iso8859-1 as
our unchangeable default?
>> > > > > > > > > > > > From the wiki
>> http://en.wikipedia.org/wiki/ISO8859-1, it says "In
>> > > > > > > > > > > > computing
>> > > > > > > > > > > > applications, encodings that
provide full UCS support
>> (such as
>> > > > > > > > > > > >
>> UTF-8<http://en.wikipedia.org/wiki/UTF-8>and
>> > > > > > > > > > > > UTF-16
>> <http://en.wikipedia.org/wiki/UTF-16>) are finding
>> increasing
>> > > > > > > > > > > >
>> > > > > > > > > > > >  favor
>> > > > > > > > > > > >
>> > > > > > > > > > >  over encodings based on ISO 8859-1."
Should we simply
>> change
>> > > > > > > > > > > iso8859-1
>> > > > > > > > > > >
>> > > > > > > > > > > > to
>> > > > > > > > > > > > utf-8?
>> > > > > > > > > > > >
>> > > > > > > > > > > > --
>> > > > > > > > > > > > Yours sincerely,
>> > > > > > > > > > > > Charles Lee
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > >
>> > > > > > > > > > > Best Regards!
>> > > > > > > > > > >
>> > > > > > > > > > > Jimmy, Jing Lv
>> > > > > > > > > > > China Software Development Lab,
IBM
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > --
>> > > > > > > > > > Yours sincerely,
>> > > > > > > > > > Charles Lee
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > --
>> > > > > > > Best Regards,
>> > > > > > > Regis.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > > --
>> > > > > Best Regards,
>> > > > > Regis.
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Best Regards,
>> > > Regis.
>> > >
>> >
>> >
>> >
>> > --
>> > Yours sincerely,
>> > Charles Lee
>> >
>> >
>>
>>
>>
>> --
>> Yours sincerely,
>> Charles Lee
>>
>>
>>
>

Mime
View raw message