xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <si...@ecnetwork.co.nz>
Subject re: bug 14531, XML Serialization and standalone
Date Thu, 14 Nov 2002 05:54:53 GMT

I recently raised a bug against xerces' XMLSerializer class regarding
the "standalone" attribute: 

Glenn Marcy's comment on this bug has left me rather confused.
I went back to the W3C XML 1.0 spec, and am now even more confused :-)

I am therefore continuing this on the user's list rather than the bug
comments or the dev list. 

Possibly this discussion belongs to a general xml-users list rather than
the xerces-users list. If the general consensus here is that the
XMLSerializer class is indeed doing the right thing for the standalone
attribute, I'll take this email to a more appropriate discussion list.

Glenn, I have CC'd you directly on this in case you are not on the
user's list and are willing to help me out here. I will leave you off
any future emails on this topic unless you indicate otherwise.


The original bug raised by me [excerpt]:

> Currently, the XMLSerializer class outputs the "standalone" 
> attribute of the <?xml ...?> prolog if-and-only-if the public
> and system identifiers being output in the DOCTYPE tag are null.
> It seems to me to be perfectly valid to have standalone="yes" 
> AND public/system IDs. The relevant section of the xml spec is:
? http://www.w3.org/TR/REC-xml#sec-rmd
> No mention is made here of forbidding standalone=yes when a 
> DTD ID is given in the DOCTYPE tag.


Glenn Marcy commented on the original bug:

>> standalone="yes" --> the DTD must be read if-and-only-if validation
>> is enabled. (ie DTD can be ignored if validation disabled, a good
>> optimisation!)

> This is incorrect.  The document might not actually be 
> standalone, which would only be a failure of a validity 
> constraint, which a non-validating processor would not check.
>  Therefore, the document can contain references to external markup
> declarations that change the infoset of the document, like default
> values for attributes that are not specified.  If a non-validating
> processor reads those declarations then it is obligated to act on
> them.  The fact that the standalone declaration is in error does not
> change this.
> Now obviously a non-validating processor is not obligated to read
> external markup declarations at all, but Xerces already has features
> defined to control this behavior.  There is nothing in the XML
> specification that says that the presence of standalone="yes" 
> should cause non-validating processors to change 
> their behavior with respect to reading external entities.

I think we have very different interpretations of what "standalone"
means - which probably means mine is wrong. But what I understand it to
mean, when embedded within a source xml document, is:

"Parser, I *promise* you that there is nothing in the DTD specified in
the DOCTYPE tag (or any other external entity) which will affect the
results of parsing this file. If you are a validating parser, then you
will need to process external entities anyway in order to check the
document syntax, but if you are not a validating parser, then there is
no need to read the DTD."

Section 2.9 of the XMl spec says:

"In a standalone document declaration, the value "yes" indicates that
there are no external markup declarations which affect the information
passed from the XML processor to the application. "

Surely this means that by specifying standalone='yes' and validation=no,
xml parsing will be faster because the parser can completely ignore all
external markup declarations?

Of course, if the XML document containing the standalone='yes' statement
is lying (there are indeed things in the DTD which affect the created
document, like default attribute values) then the result of parsing will
be incorrect. 

> This is incorrect.  The document might not actually be 
> standalone

But that's not the parser's problem, is it? If I am wrong, I get what I
deserve. And if I write an xml document, and set standalone='yes'
because I know the DTD doesn't define any default attribute values etc,
and I want the performance benefits that come from allowing the parser
to skip the DTD processing, then why should the parser read the DTD
anyway in an attempt to prove me a liar?

Yes, setting "standalone=yes" is therefore a dangerous thing to do; if
the DTD does define something significant then the results of parsing
are incorrect.But that's life, no?

> Now obviously a non-validating processor is not obligated to read
> external markup declarations at all ...

I thought that a non-validating parser still had to read external markup
to determine default attributes, etc. It just doesn't need to report any
violation of the xml structure. That has certainly been my experience
with Xerces in the past; disabling validation then parsing a file with a
DOCTYPE containing a SYSTEM entry gives me errors about being unable to
find the file. In fact I have had to define an EntityResolver which
returns empty DTDs in order to mimic "standalone=yes" behaviour when
parsing XML from our customers where we don't have a copy of the DTD
locally (and don't need one because there are no default attributes etc
in the DTD).

Am I misunderstanding something here? Any comments welcome!



To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

View raw message