Return-Path: Delivered-To: apmail-xml-xerces-j-user-archive@xml.apache.org Received: (qmail 79714 invoked by uid 500); 14 Nov 2002 21:30:38 -0000 Mailing-List: contact xerces-j-user-help@xml.apache.org; run by ezmlm Precedence: bulk Reply-To: xerces-j-user@xml.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list xerces-j-user@xml.apache.org Received: (qmail 79637 invoked from network); 14 Nov 2002 21:30:37 -0000 Subject: re: bug 14531, XML Serialization and standalone From: Simon Kitching To: xerces-j-user@xml.apache.org Cc: xerces-j-dev@xml.apache.org In-Reply-To: References: Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.8-3mdk Date: 15 Nov 2002 10:28:58 +1300 Message-Id: <1037309339.13035.281.camel@PCSIMON.ecnnz.ecnetwork.co.nz> Mime-Version: 1.0 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Thanks for the excellent explanation, Glenn. Clearly the issue of what to do when *parsing* an xml document which has standalone=yes is not trivial, and the results will be parser-dependent, and parser-configuration-dependent. It is very useful to know this. And very interesting to know that "validation=false" in Xerces really just suppresses error reporting, rather than making Xerces a non-validating parser. That explains a few things... However the issue I face is *serialization* of xml. Basically, I need to read xml generated by customer A, transform it using xslt and serialize it into a format as specified by customer B. Now, if customer B says that they want standalone=yes and a doctype with a public or system ID, there is currently *no way* to generate this output using the XMLSerializer class. If there is a public or system ID, then any standalone attribute I specify in the OutputFormat is ignored. What I would need to do is post-process the output (sed??) to add the standalone attribute! What I was wondering when I initially raised this issue, was whether in fact there was something in the xml spec which meant this combination was invalid xml. What I understand from your explanation is that specifying standalone='yes' is generally a pretty bad/dangerous idea when there is an external DTD but it is not actually forbidden by the XML standard. Therefore, I think that it should be possible. And I would like to note that in my situation, there is no guaruntee that the XML I generate from Xerces is going to be parsed by Xerces... This isn't a major or urgent issue for me; no customer has yet asked for this combination. I was working on improving serialization in my application, and spotted the standalone issue as a possible future problem for me rather than a current one. To summarize: I think the bug raised is actually valid. While inadvisable, it is _legal_ for an xml document to have standalone=yes and a public or system ID in the DOCTYPE, and therefore the XMLSerializer class should not prevent the user (me) from generating such output if they really want it. Questions: Would changing this behaviour (automatic suppression of standalone=yes) break any existing code out there? If not, then I would be happy to submit a patch. It will be pretty trivial [the kind of patch I like]. I would certainly include advice in the javadoc for method setStandalone advising against using it. Or we could just close the bug & forget about it; I'm not deeply emotionally attached to this issue ;-) Other notes: Glenn's response raises the issue of whether some kind of Xerces configuration item (settable feature) is needed to tell Xerces to skip reading of external entities when standalone=yes, but to read them when standalone=no. I like this idea. This is the behaviour I (naively) expected to occur by default, but having it as an optional feature is reasonable, with the default behaviour instead being the "safest" option of processing external entities anyway. Regards, Simon --------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org For additional commands, e-mail: xerces-j-user-help@xml.apache.org