axis-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nadir Amra <a...@us.ibm.com>
Subject Re: platform support: internationalization and EBCDIC vs ASCII
Date Mon, 14 Feb 2005 21:43:31 GMT
Sorry for not replying sooner...

The conclusion was that we need to do this :-)

This item is basically tied to ensuring all data in the engine is stored 
using wchar, which I do not think will be done for 1.5.

Still studying the code....but I will say that was able to successfully 
issue a client request on OS/400 (EBCDIC) successfully by simply adding a 
conversion steps so that any outgoing data is converted from the 
character-set of the process to UTF-8.  Incoming data seems to be handled 
ok by the XML parser, mainly because the XML parser returns the data in 
the character-set of the process, which I was grateful for :-)  However, I 
did have to change the code so that as the HTTP request came in, the 
literals that were being searched for such as \r\n was in the ASCII 
character set.  It was not much to change.

More on this later.....

John Hawkins <HAWKINSJ@uk.ibm.com> wrote on 02/01/2005 11:55:06 AM:

> 
> Hi Nadir, 
> 
> whatever happened to this? Did we get any conclusions? 
> 
> John Hawkins
> 
> 
> 

> 
> Nadir Amra <amra@us.ibm.com> 
> 22/12/2004 08:37 
> 
> Please respond to
> "Apache AXIS C Developers List"
> 
> To
> 
> "'Apache AXIS C Developers List'" <axis-c-dev@ws.apache.org> 
> 
> cc
> 
> Subject
> 
> platform support: internationalization and EBCDIC vs ASCII
> 
> 
> 
> 
> Correct me if I am wrong....and sorry for the long note but it is 
> necessary.
> 
> The AXIS code has a restriction that the locale of the process must be 
> UTF-8 assumes everything is in UTF-8.  Thus the code works specifically 
in 
> processes where the locale is set to UTF-8 or to a single byte ASCII 
> character set such as the Latin-1 locales, since the character set is a 
> subset of UTF-8).  For those locales that are not single byte or UTF-8, 
> code does not work so well.  Obviously the code does not work on 
> EBCDIC-based systems such as OS/400.
> 
> I need this restriction removed in version 1.5.
> 
> To remove the restriction, the code needs to be sensitive to the locale 
of 
> the process that the client is running in and assume any data received 
> from the client that is to be passed to a web service is in the 
character 
> set of the locale, and thus needs to be converted to UTF-8.  Similarly, 
> any data received from the web service needs to be converted to the 
> character set of the running process, since the various C-runtime string 

> functions are dependent on the locale of the process in order for the 
> functions to work properly.
> 
> The XML parsers can handle the data coming in from the Web service no 
> matter what the encoding, and there is no problem on that side of 
things. 
> I am assuming the data obtained by the XML parser is being transcoded to 

> UTF-8. 
> 
> In addition, there are hard-code literal strings that is assumed to be 
in 
> ASCII.  This would also need to be changed. 
> 
> I plan spending a lot of time in the next 4 weeks to get the 
> infrastructure built into the code to allow the code to run on OS/400. 
> Hopefully, the work I put in can easily be extended to other platforms 
so 
> that if someone wanted to run in a Japanese locale, it would work with 
> minor changes.
> 
> My thoughts are that a user can indicate whether transcoding should be 
> enabled via a configuration property in the property file.  When that 
> happens, the code will create transcoders to convert data from the 
locale 
> of the process to UTF-8 and from UTF-8 to the locale of the process.  I 
> still have to investigate if it is possible to use the XML parser 
> transcoders, or even if that is possible.  I am looking for direction 
from 
> you all to see how what a good implementation would be and where in the 
> code do you think this support would need to be added. 
> 
> As far as the literal strings that should be in Latin-1 character set, 
> this is easily worked around by putting the string in a buffer and 
> converted using the PLATFORM_STRTOASC() macro (currently in each 
> PlatformSpecificXXXX.hpp file).  For ASCII-based systems, these macros 
are 
> identity macros.  In addition, if data in a buffer is known to be in the 

> latin-1 character set and needs to be converted to the character set of 
> the process, PLATFORM_ASCTOSTR() can be used.  Again, for ASCII-based 
> systems,  these macros are identity macros.  I plan on doing this as a 
> first stage, which should be a benign change.
> 
> What are your thoughts?
> 


Mime
View raw message