Return-Path: Delivered-To: apmail-ws-axis-dev-archive@www.apache.org Received: (qmail 5265 invoked from network); 3 Sep 2003 11:11:02 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 3 Sep 2003 11:11:02 -0000 Received: (qmail 18924 invoked by uid 500); 3 Sep 2003 11:10:45 -0000 Delivered-To: apmail-ws-axis-dev-archive@ws.apache.org Received: (qmail 18878 invoked by uid 500); 3 Sep 2003 11:10:44 -0000 Mailing-List: contact axis-dev-help@ws.apache.org; run by ezmlm Precedence: bulk Reply-To: axis-dev@ws.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list axis-dev@ws.apache.org Received: (qmail 18774 invoked from network); 3 Sep 2003 11:10:42 -0000 From: damitha@opensource.lk To: axis-dev@ws.apache.org Subject: Axis C++ problem--FYI Date: Wed, 3 Sep 2003 17:00:15 +0600 Message-Id: <20030903103413.M22524@opensource.lk> X-Mailer: Open WebMail 2.00 20030424 X-OriginatingIP: 220.247.245.62 (damitha@opensource.lk) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi all, The code in Axis c++ cvs compiles both for windows and linux. But it works only for windows. We have identified the problem as is clear from the following xerces mailing archive. wchar_t is 32bit in linux and is 16 bit in Windows. We made the mistake of assuming that the wchar_t is 16 bit in every platform. We hope to solve this problem immediately and apologise for any inconvenience caused. This problem is describe in Xerces mailing list as follows. ======================================================================== List: xerces-c-dev Subject: RE: wchar_t and XMLCh From: "Nikko" Date: 2003-03-04 20:58:28 [Download message RAW] Believe me, wchar_t is evil. Redefine your own 16 bits string or use a string instead, which you can convert to/from XMLCh* easily. Even if theoretically unsigned short is not necessarily two bytes long, it is far more reliable than wchar_t being 2 bytes long. Best -----Message d'origine----- De : David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com] Envoy� : mardi 4 mars 2003 20:27 � : xerces-c-dev@xml.apache.org Objet : RE: wchar_t and XMLCh > Thanks for your suggestion. That will probably work in every case except this one. > The reason being we are building a wrapper library over Xerces and our interface > exposes only std::wstring. We don't expose internal xerces types. In particular > on Solaris, we want to link against STLport library. Is there any requirement in > Xerces that will force XMLCh to be 2 bytes? If all xerces code uses sizeof(XMLCh) > then it should be probably be ok, but if there is any hard coded value (which > assumes 2 bytes), then the change won't work. I suggest you typedef something which mirrors the Xerces XMLCh typedef and use std::basic_string. Otherwise, you risk some incompatibility with Xerces now, or in the future. You also inadvertantly encourage the use of wide-character functions may not be prepared to accept UTF-16 code points: // find the first newline character in the Xerces string. const wchar_t* const newlineChar = wcschr(xercesStr.c_str(), 10); Will this work? Maybe, but who knows? A particular compiler/platform has a particular encoding for wchar_t and you should not attempt to force improperly-encoded code points into it. Of course, you can always change the Xerces typedef to wchar_t and do what you want, but that means you're on your own if there's a problem now or in the future. You also have to build a custom version of Xerces for every platform and be prepared to support it. It seems just a bit too scary for me. Dave qchen m> cc: (bcc: David N Bertoni/Cambridge/IBM) Subject: RE: wchar_t and XMLCh 03/04/2003 09:41 AM Please respond to xerces-c-dev David, Thanks for your suggestion. That will probably work in every case except this one. The reason being we are building a wrapper library over Xerces and our interface exposes only std::wstring. We don't expose internal xerces types. In particular on Solaris, we want to link against STLport library. Is there any requirement in Xerces that will force XMLCh to be 2 bytes? If all xerces code uses sizeof(XMLCh) then it should be probably be ok, but if there is any hard coded value (which assumes 2 bytes), then the change won't work. Qi Chen -----Original Message----- From: David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com] Sent: Tuesday, March 04, 2003 10:08 AM To: xerces-c-dev@xml.apache.org Subject: Re: wchar_t and XMLCh > Basically I need to convert the XMLCh* to a std::wstring and vice versa. In Xerces, XMLCh is > typdef-ed to unsigned short (2 bytes). Under win32, there is no need for conversion since wchar_t > is also typedef-ed to unsigned short. In Solaris/Linux/VMS, however, wchar_t is typedef-ed to > unsigned long (4 bytes), so the conversion seem to be inevitable. There are several reason there's no need for conversion on Win32. One is that Visual C++ 6.0 doesn't not implement wchar_t as a proper type, which is not correct. Most of the platforms to which you refer, depending on the age of the compiler, _do_ implement wchar_t as a proper type, and not as a typedef. The other, and more important reason, is because Win32 uses Unicode, so wide characters are known to be UCS-2/UTF-16 code points. > My question is: Does Xerces implementation requires that the size XMLCh to be 2 bytes? if I > change the typedef of XMLCh to wchar_t and recompile the xerces, would it work? I know the > answer is probably no, but I just want to make sure. Of course the memory usage will be doubled > if we change the XMLCh to 4 bytes, but that is not a concern for me. For any given operating system, the issue is not really the size of XMLCh, it's whether the operating system assumes wide characters are UCS-2/UTF-16 code points. If not, there's no point in making XMLCh and wchar_t compatible, because the OS cannot process them. You should re-examine why you're storing UTF-16 encoded character, like Xerces produces, in std::wstring. std::basic_string might be a better choice. Dave ======================================================================== Note:We can immediatly solve this by reverting back to using XMLString::transcode. But it is highly inefficient. We are working on an alternative. damitha -- Lanka Software Foundation (http://www.opensource.lk) Promoting Open-Source Development in Sri Lanka