xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <si...@ecnetwork.co.nz>
Subject re: bug 14531, XML Serialization and standalone
Date Thu, 14 Nov 2002 21:28:58 GMT
Thanks for the excellent explanation, Glenn.

Clearly the issue of what to do when *parsing* an xml document which has
standalone=yes is not trivial, and the results will be parser-dependent,
and parser-configuration-dependent. It is very useful to know this. And
very interesting to know that "validation=false" in Xerces really just
suppresses error reporting, rather than making Xerces a non-validating
parser. That explains a few things...

However the issue I face is *serialization* of xml. Basically, I need to
read xml generated by customer A, transform it using xslt and serialize
it into a format as specified by customer B.

Now, if customer B says that they want standalone=yes and a doctype with
a public or system ID, there is currently *no way* to generate this
output using the XMLSerializer class. If there is a public or system ID,
then any standalone attribute I specify in the OutputFormat is ignored.
What I would need to do is post-process the output (sed??) to add the
standalone attribute!

What I was wondering when I initially raised this issue, was whether in
fact there was something in the xml spec which meant this combination
was invalid xml. What I understand from your explanation is that
specifying standalone='yes' is generally a pretty bad/dangerous idea
when there is an external DTD but it is not actually forbidden by the
XML standard. Therefore, I think that it should be possible. And I would
like to note that in my situation, there is no guaruntee that the XML I
generate from Xerces is going to be parsed by Xerces...

This isn't a major or urgent issue for me; no customer has yet asked for
this combination. I was working on improving serialization in my
application, and spotted the standalone issue as a possible future
problem for me rather than a current one.

To summarize:

I think the bug raised is actually valid. While inadvisable, it is
_legal_ for an xml document to have standalone=yes and a public or
system ID in the DOCTYPE, and therefore the XMLSerializer class should
not prevent the user (me) from generating such output if they really
want it.


Would changing this behaviour (automatic suppression of standalone=yes)
break any existing code out there?

If not, then I would be happy to submit a patch. It will be pretty
trivial [the kind of patch I like]. I would certainly include advice in
the javadoc for method setStandalone advising against using it.

Or we could just close the bug & forget about it; I'm not deeply
emotionally attached to this issue ;-)

Other notes:

Glenn's response raises the issue of whether some kind of Xerces
configuration item (settable feature) is needed to tell Xerces to skip
reading of external entities when standalone=yes, but to read them when
standalone=no. I like this idea. This is the behaviour I (naively)
expected to occur by default, but having it as an optional feature is
reasonable, with the default behaviour instead being the "safest" option
of processing external entities anyway.



To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

View raw message