xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Freimann, Mario" <mario.freim...@siemens.com>
Subject Reading german umlaute from xml file
Date Fri, 25 Jul 2008 14:38:52 GMT
Dear mailing list users,

I have a xml file I want to process with xerces. I now have a problem with german umlaute.
I extracted the code I use to show the problem. After reading the mailing list I switched
from XMLString::transcode to a utf8 transcoder, but it doesn't work either. The problematic
platform is Red Hat Enterprise Linux (3 32Bit and 5 64Bit). The XMLString transcode works
well on a Solaris 10 platform, the transcoder doesn't work, too. Xerces version is 2.7 linked
statically.


I don't know where the problem with the transcoder is, any help is appreciated. 


[[xml example file test.xml]]

<?xml version="1.0" encoding="UTF-8"?><xtest><test>M&#252;nchen</test></xtest>

[[code]]

#include <stdio.h>

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/dom/DOM.hpp>
#include <xercesc/sax/ErrorHandler.hpp>
#include <xercesc/sax/SAXParseException.hpp>
#include <xercesc/framework/MemBufInputSource.hpp>

#include <xercesc/util/TransService.hpp>

XERCES_CPP_NAMESPACE_USE

int main(int argc, char** argv)
{
        XMLPlatformUtils::Initialize();

        XercesDOMParser* _parser = new XercesDOMParser();

        _parser->setValidationScheme(XercesDOMParser::Val_Auto);
        _parser->setIncludeIgnorableWhitespace(false);
        _parser->setDoNamespaces(true);
        _parser->setDoSchema(true);

        XMLCh* xFile = NULL;

        xFile = XMLString::transcode("test.xml");
        _parser->parse( xFile );
        XMLString::release(&xFile);

        DOMDocument* _xmlDoc = _parser->getDocument();

        DOMNode* rootNode = _xmlDoc->getDocumentElement();

        char nodeName[1024] = "";
        char nodeValue[1024] = "";
        char* tmp = NULL;

        DOMNode* childNode = rootNode->getFirstChild();
        tmp = XMLString::transcode( childNode->getNodeName() );
        strcpy( nodeName, tmp );
        printf("name [%s]\n", nodeName );fflush(stdout);

        childNode = childNode->getFirstChild();
        tmp = XMLString::transcode( childNode->getNodeValue() );
        strcpy( nodeValue, tmp );
        printf("value xmlstring [%s]\n", nodeValue );fflush(stdout);

        XMLTranscoder* utf8Transcoder ;
        XMLTransService::Codes failReason;

        XMLCh* xmlChars = new XMLCh[ 1024 ];
        unsigned int eaten = 0;
        unsigned char* charSizes = new unsigned char[1024];

        utf8Transcoder = XMLPlatformUtils::fgTransService->makeNewTranscoderFor("UTF-8",
failReason, 16*1024);
        utf8Transcoder->transcodeFrom( (XMLByte*)childNode->getNodeValue(), XMLString::stringLen(
childNode->getNodeValue() ), xmlChars, 1024, eaten, charSizes );
        printf("value xmlchars [%s] eaten [%d] charSizes [%s]\n", xmlChars, eaten, charSizes
);

        return 0;
}

[[output after calling test application on RHEL 5]]

name [test]
value xmlstring []
value xmlchars [M] eaten [2] charSizes []

[[output after calling test application on Sol 10]]

name [test]
value xmlstring [München]
value xmlchars [] eaten [3] charSizes []


With kind regards,
Mario Freimann

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Peter
Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe Kaeser, Jim Reid-Anderson,
Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen; Sitz der Gesellschaft: Berlin und
München; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr.
DE 23691322

Wichtiger Hinweis: Diese E-Mail und etwaige Anlagen enthält firmenvertrauliche Informationen.
Sollten Sie diese E-Mail irrtümlich erhalten haben, benachrichtigen Sie uns bitte durch Antwort-Mail
und löschen Sie diese E-Mail nebst Anlagen von Ihrem System. Vielen Dank.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message