xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Swanson, Brion" <Brion.Swan...@westgroup.com>
Subject RE: Transforming XML document
Date Thu, 14 Nov 2002 13:58:22 GMT
Simon,

You are mostly correct.  However, whitespace in between elements (e.g.
<tag1/>  <tag2/>)
will be preserved if there is no DTD associated with the document.

The parser has no way of knowing what whitespace is "ignorable" if there is
no DTD to describe the structure of the XML and therefore even whitespace
between the end tag of one element and the start tag of another element is
reported in the characters() SAX event.

I agree that usually it is not the case that one wants to preserve this
whitespace, and if a DTD is associated, the ignorableWhitespace() event is
called instead of characters() for inter-tag whitespace.  However, it's
incorrect to say all whitespace other than intra-tag whitespace is ignored
by the parser.

Everything else you mentioned about the spacing of attributes (and even more
to the point, the ordering of attributes) is not guaranteed by the XML spec.

Cheers!
Brion

-----Original Message-----
From: Simon Kitching [mailto:simon@ecnetwork.co.nz]
Sent: Thursday, November 14, 2002 1:06 AM
To: xerces-j-user@xml.apache.org
Subject: Re: Transforming XML document


Hi,

Unfortunately, you're out of luck. XML parsing just doesn't work that
way.

XML parsers are required to respect the contents of <i>text nodes</i>
within an xml document, but in every other place whitespace is not
significant according to the spec.

You can either treat the input file as a plain text file (eg use perl to
modify it), or you can treat it as XML in which case the XML parser will
guaruntee to preserve the *meaning* of your XML document, but not
necessarily its layout. 

For example, <x  y="a"> (two spaces) in xml means *exactly* the same
thing as <x y="a"> (one space). 

You do get to choose the "style" in which the output is generated
(indented or not, how much indenting, etc) but you cannot ask for "the
same as the input", because no existing XML parser bothers to keep that
information around.

Regards,

Simon

On Thu, 2002-11-14 at 18:36, Wai-Yip Tung wrote:
> I am trying to make simple transformation on a XML document. Let say just
> changing one attribute value. I want to keep everything else the same,
> including white spaces.
> 
> My first task is to parse and output a document identical to the input
> document. It seems the sample code sax.Writer is a good example.
> Unfortunately it altered the document in several ways
> 
> - white space in an element is changed, e.g.
>   <x  y = "a"> becomes <x y="a">
> 
> - The empty element becomes two tags, e.g.
>   <x/> becomes <x></x>
> 
> Anyone can give me some direction?
> 
> Wai yip
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Mime
View raw message