xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edwin Goei <Edwin.G...@eng.sun.com>
Subject Re: Differences in DTD handling between Xerces-J and Crimson
Date Sat, 24 Feb 2001 04:18:35 GMT
David Kellum wrote:
> 
> I'm writing a performance minded server in Java that needs to repeatedly
> parse relatively small (5k) XML documents obtained from a remote
> server.  I don't need or want to have the overhead of any validation in
> this parse.  However, the returned document includes a doctype
> declaration like so:
> 
> <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
> <!DOCTYPE FOO SYSTEM "foo.dtd">
> <FOO>
> ...
> </FOO>
> 
> I don't control this format so I can't get rid of the DOCTYPE
> declaration without performing some error-prone hack like stripping the
> line out before passing it to the parser.
> 
> With Xerces-J I can work around this by using the SAX-2 interface's
> XMLReader.setEntityResolver() to an instance of the following:
> 
> public class NullEntityResolver
>     implements EntityResolver
> {
>     public InputSource resolveEntity( String publicId, String systemId )
>     {
>        return new InputSource( new ByteArrayInputStream( new byte[0] )
> );
>     }
> }
> 
> However, I can't seem to do the same with the Crimson 1.1 parser.  Here
> I get the following:
> 
> Exception in thread "main" org.xml.sax.SAXParseException: Relative URI
> "foo.dtd"; can not be resolved without a document URI.
>     at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3035)
>     at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3029)
>     at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2627)
>
> Any suggestions on how I might work around this with Crimson?  Any
> comments on the validity of my approach for dealing with this in
> Xerces-J?

Sounds like a good approach.  Looking at the crimson code, it looks like
the parser tries to resolve the SystemID in the doctype decl which is a
relative URI so it tries to get the base URI of the document, which is
null, hence the exception.

Try this, in the code that starts the parse, there is a SAX InputSource
object representing the main document.  Use InputSource.setSystemID() to
set some URI on the main document.  I believe an empty string ("")
should work as well.  HTH,

-Edwin

Mime
View raw message