harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Beyer <ndbe...@apache.org>
Subject Re: Shall we change our file.encoding
Date Fri, 17 Jul 2009 01:05:52 GMT
On Thu, Jul 16, 2009 at 1:28 AM, Charles Lee<littlee1032@gmail.com> wrote:
> Hi guys,
>
> I have add the locale function in the drlvm, the patch is attached. Please
> try this new patch on the linux.
>
> The patch should work on the linux but fail on the windows. Because windows
> returns code page not charset from the setlocale.

Code page and character set are the same thing. We shouldn't need to
convert it as the Charset APIs will have to support the values anyway.

What's the value you're getting? If it's 'Cp1252', then we're good, as
that's just an alias for 'Windows-1252' (or vice-versa).

-Nathan


> I hv tried long time to
> get the charset name from the codepage, for example:
> CPINFOEX cpInfoEx;
> BOOL iReturn = GetCPInfoEx(CP_ACP,0, &cPInfoEx);
> if (iReturn > 0) {
>     printf("FULL NAME %s\n", cPinfoEx,CodePageName);
> }
> But I only get the full name without any format.
>
> There is code page identifiers map in the msdn, detail here. I may hard code
> this map in the file. But the note on the msdn says:
>      "ANSI code pages can be different on different computers, or can be
> changed for a single computer, leading to data corruption. For the most
> consistent results, applications should use Unicode, such as UTF-8 or
> UTF-16, instead of a specific code page."
> I am afraid hard-code will fail on some machines. (By the way, this seems
> the UTF-8 is suggested to be the default again :-)
>
> There is also a class Encoding in the VC++, detail here. But we can not use
> it here.
>
> So anyone knows some thing about locale on the windows?
> Again, shall use UTF-8 as our default?
>
> On Wed, Jul 15, 2009 at 2:12 PM, Charles Lee <littlee1032@gmail.com> wrote:
>>
>> That seems we should add it in the drlvm.
>>
>> On Wed, Jul 15, 2009 at 1:58 PM, Regis <xu.regis@gmail.com> wrote:
>>>
>>> Nathan Beyer wrote:
>>>>
>>>> Is the IBM VME dealing with this correctly? Do we just need to fix
>>>> DRLVM?
>>>
>>> Yes, I only tested on Linux, IBM VME set the property correctly.
>>>
>>>>
>>>> On Wed, Jul 15, 2009 at 12:25 AM, Regis<xu.regis@gmail.com> wrote:
>>>>>
>>>>> Kevin Zhou wrote:
>>>>>>
>>>>>> Yea, from luniglob.c, CL attempts to read the "file.encoding" property
>>>>>> adown
>>>>>> VM but fails to get the correct encoding.
>>>>>>
>>>>>> Regis, do you know any other specific ways that CL can gain the right
>>>>>> property?
>>>>>
>>>>> We can get from OS directly. Maybe just read env variables on Linux?
>>>>>
>>>>>> Wed, Jul 15, 2009 at 9:59 AM, Regis <xu.regis@gmail.com> wrote:
>>>>>>
>>>>>>> Charles Lee wrote:
>>>>>>>
>>>>>>>> Hi Nanthan,
>>>>>>>>
>>>>>>>> If the file encoding derive from the OS, it should be the
some bugs
>>>>>>>> in
>>>>>>>> it
>>>>>>>> because on my LINUX machine the locale is en_US.UTF-8. Our
default
>>>>>>>> codec
>>>>>>>> is
>>>>>>>> still ISO8859-1. Do you know where can we found such codes?
>>>>>>>>
>>>>>>> Classlib expected vm do this and set the property, but it didn't,
so
>>>>>>> we
>>>>>>> have to do this by ourselves.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Jul 14, 2009 at 10:17 PM, Nathan Beyer <nbeyer@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  Are we talking about windows or linux?the default file
encoding
>>>>>>>> should
>>>>>>>>>
>>>>>>>>> derive from the OS. I believe that's defined by the specs.
>>>>>>>>>
>>>>>>>>> Sent from my iPhone
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jul 14, 2009, at 5:51 AM, Charles Lee <littlee1032@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>  On Tue, Jul 14, 2009 at 6:12 PM, Jimmy,Jing Lv
>>>>>>>>> <firepure@gmail.com>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>>>  Charles, I believe UTF-8 is the default encoding
for RI, and it
>>>>>>>>>>> sounds
>>>>>>>>>>> reasonable.
>>>>>>>>>>>  BTW, it may encounter some compatibility problem,
maybe we need
>>>>>>>>>>> to
>>>>>>>>>>> run
>>>>>>>>>>> more tests to verify?
>>>>>>>>>>>
>>>>>>>>>>> 2009/7/14 Charles Lee <littlee1032@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>  Hi guys:
>>>>>>>>>>>
>>>>>>>>>>>> I am doing some test cases on the ant junit
test case and
>>>>>>>>>>>> meeting
>>>>>>>>>>>> some
>>>>>>>>>>>> encoding problems. I find they are maybe
caused by the different
>>>>>>>>>>>> default
>>>>>>>>>>>> encoding from RI and harmony. My local is
en_US.UTF-8, RI
>>>>>>>>>>>> default is
>>>>>>>>>>>>
>>>>>>>>>>>>  UTF-8
>>>>>>>>>>>
>>>>>>>>>>>  but harmony is 8859-1. And then I have encountered
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> HARMONY-3736<https://issues.apache.org/jira/browse/HARMONY-3736>,
>>>>>>>>>>>> and the two diffs attached on that issue.
It seems we always get
>>>>>>>>>>>> 8859-1.
>>>>>>>>>>>> Because: (correct me if wrong :-)
>>>>>>>>>>>>
>>>>>>>>>>>> 1. we remove the set code in the vm. we will
always get null if
>>>>>>>>>>>> we
>>>>>>>>>>>> call
>>>>>>>>>>>>
>>>>>>>>>>>>  vm
>>>>>>>>>>>
>>>>>>>>>>>  method
>>>>>>>>>>>>
>>>>>>>>>>>> 2. we set the file.encode in the libglob.c,
if we got null from
>>>>>>>>>>>> vm,
>>>>>>>>>>>> we
>>>>>>>>>>>>
>>>>>>>>>>>>  set
>>>>>>>>>>>
>>>>>>>>>>>  Sorry, it should be luniglob.c
>>>>>>>>>>>
>>>>>>>>>>  8859-1.
>>>>>>>>>>>>
>>>>>>>>>>>> 3. we can not set file.encode on the run
time.
>>>>>>>>>>>>
>>>>>>>>>>>> ant use UTF-8 to encode filename which contains
the non-ascii
>>>>>>>>>>>> character.
>>>>>>>>>>>> So why we use iso8859-1 as our unchangeable
default?
>>>>>>>>>>>> From the wiki http://en.wikipedia.org/wiki/ISO8859-1,
it says
>>>>>>>>>>>> "In
>>>>>>>>>>>> computing
>>>>>>>>>>>> applications, encodings that provide full
UCS support (such as
>>>>>>>>>>>> UTF-8<http://en.wikipedia.org/wiki/UTF-8>and
>>>>>>>>>>>> UTF-16 <http://en.wikipedia.org/wiki/UTF-16>)
are finding
>>>>>>>>>>>> increasing
>>>>>>>>>>>>
>>>>>>>>>>>>  favor
>>>>>>>>>>>
>>>>>>>>>>>  over encodings based on ISO 8859-1." Should
we simply change
>>>>>>>>>>> iso8859-1
>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>> utf-8?
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Yours sincerely,
>>>>>>>>>>>> Charles Lee
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Best Regards!
>>>>>>>>>>>
>>>>>>>>>>> Jimmy, Jing Lv
>>>>>>>>>>> China Software Development Lab, IBM
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Yours sincerely,
>>>>>>>>>> Charles Lee
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Regis.
>>>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Regis.
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Regis.
>>
>>
>>
>> --
>> Yours sincerely,
>> Charles Lee
>>
>
>
>
> --
> Yours sincerely,
> Charles Lee
>
>

Mime
View raw message