xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Pelton" <...@PKC.com>
Subject RE: DOMLSSerializer converts white space characters in attributes to xml entities
Date Thu, 30 Jul 2009 15:54:38 GMT
Note that this behavior is required by the XML specification.  See http://www.w3.org/TR/2008/REC-xml-20081126/#AVNormalize.
 It's dense, but in summary, when an attribute value is loaded, leading and trailing white
space is discarded, and each sequence of spaces, tabs, carriage returns, and linefeeds are
converted to a single space.

This applies only if there's no schema indicating that the attribute value is CDATA, but the
safest thing for a serializer to do is assume that the value might not be CDATA (or might
not be recognized as such by whatever processor loads the document) and that whitespace should
be preserved.  The only way to guarantee that is to write the whitespace characters as entities.

-----Original Message-----
From: Alberto Massari [mailto:amassari@datadirect.com]
Sent: Thu 7/30/2009 11:37 AM
To: c-users@xerces.apache.org
Subject: Re: DOMLSSerializer converts white space characters in attributes to xml entities
No, if the serialized attribute value has newlines/tab, they are 
converted upon loading into spaces. If you want to really store such 
characters in an attribute, they have to be encoded into entities.


mini thomas wrote:
> Hi,
> I am using xerces 3.0.1 and doing the following
> 1) Parse a string
> 2)Set an attribute "newattr" on the root node. The attribute value is 
> char *temp = "\n Hello \t\t testing"
> 3) converting the parsed data back to xml
> static const XMLCh gLS[] = { chLatin_L,  chLatin_S,  chNull };
> DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(gLS);
> DOMLSSerializer*  myWriter = (impl)->createLSSerializer();
> DOMConfiguration* dc = myWriter->getDomConfig();
> dc->setParameter( XMLUni::fgDOMWRTDiscardDefaultContent,true);
> // serialize the DOMNode to a UTF-16 string
> XMLCh* theXMLString_Unicode = myWriter->writeToString(toWrite.GetDOMNodePtr());
> 4) Convert theXMLString_Unicode  to char* and print using cout.
>  I got the attribute printed this way.
> newattr="&#xA; Hello &#x9;&#x9; testing"
> Is there any way to get the attribute printed as newattr="
>  Hello  testing"
> Thanks,
> Mini

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message