xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kellum <dekel...@aol.com>
Subject Differences in DTD handling between Xerces-J and Crimson
Date Fri, 23 Feb 2001 19:34:43 GMT
I'm writing a performance minded server in Java that needs to repeatedly
parse relatively small (5k) XML documents obtained from a remote
server.  I don't need or want to have the overhead of any validation in
this parse.  However, the returned document includes a doctype
declaration like so:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

I don't control this format so I can't get rid of the DOCTYPE
declaration without performing some error-prone hack like stripping the
line out before passing it to the parser. 

With Xerces-J I can work around this by using the SAX-2 interface's
XMLReader.setEntityResolver() to an instance of the following:

public class NullEntityResolver 
    implements EntityResolver 
    public InputSource resolveEntity( String publicId, String systemId )
       return new InputSource( new ByteArrayInputStream( new byte[0] )

However, I can't seem to do the same with the Crimson 1.1 parser.  Here
I get the following:

Exception in thread "main" org.xml.sax.SAXParseException: Relative URI
"foo.dtd"; can not be resolved without a document URI.
    at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3035)
    at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3029)
    at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2627)
    at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
    at TestParse.main(TestParse.java:39)

Why do I want to use Crimson you ask?  For this document type I'm seeing
that it is performing twice as fast as Xerces J 1.3. (After the Java 1.3
-server VM has time to warm up.)

Any suggestions on how I might work around this with Crimson?  Any
comments on the validity of my approach for dealing with this in


View raw message