xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <si...@ecnetwork.co.nz>
Subject re: bug 14531, XML Serialization and standalone
Date Thu, 14 Nov 2002 21:28:58 GMT
Thanks for the excellent explanation, Glenn.

Clearly the issue of what to do when *parsing* an xml document which has
standalone=yes is not trivial, and the results will be parser-dependent,
and parser-configuration-dependent. It is very useful to know this. And
very interesting to know that "validation=false" in Xerces really just
suppresses error reporting, rather than making Xerces a non-validating
parser. That explains a few things...

However the issue I face is *serialization* of xml. Basically, I need to
read xml generated by customer A, transform it using xslt and serialize
it into a format as specified by customer B.

Now, if customer B says that they want standalone=yes and a doctype with
a public or system ID, there is currently *no way* to generate this
output using the XMLSerializer class. If there is a public or system ID,
then any standalone attribute I specify in the OutputFormat is ignored.
What I would need to do is post-process the output (sed??) to add the
standalone attribute!

What I was wondering when I initially raised this issue, was whether in
fact there was something in the xml spec which meant this combination
was invalid xml. What I understand from your explanation is that
specifying standalone='yes' is generally a pretty bad/dangerous idea
when there is an external DTD but it is not actually forbidden by the
XML standard. Therefore, I think that it should be possible. And I would
like to note that in my situation, there is no guaruntee that the XML I
generate from Xerces is going to be parsed by Xerces...

This isn't a major or urgent issue for me; no customer has yet asked for
this combination. I was working on improving serialization in my
application, and spotted the standalone issue as a possible future
problem for me rather than a current one.

To summarize:

I think the bug raised is actually valid. While inadvisable, it is
_legal_ for an xml document to have standalone=yes and a public or
system ID in the DOCTYPE, and therefore the XMLSerializer class should
not prevent the user (me) from generating such output if they really
want it.

Questions:

Would changing this behaviour (automatic suppression of standalone=yes)
break any existing code out there?

If not, then I would be happy to submit a patch. It will be pretty
trivial [the kind of patch I like]. I would certainly include advice in
the javadoc for method setStandalone advising against using it.

Or we could just close the bug & forget about it; I'm not deeply
emotionally attached to this issue ;-)

Other notes:

Glenn's response raises the issue of whether some kind of Xerces
configuration item (settable feature) is needed to tell Xerces to skip
reading of external entities when standalone=yes, but to read them when
standalone=no. I like this idea. This is the behaviour I (naively)
expected to occur by default, but having it as an optional feature is
reasonable, with the default behaviour instead being the "safest" option
of processing external entities anyway.

Regards,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Mime
View raw message