xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Smolen <msmo...@armature.com>
Subject RE: Differences in DTD handling between Xerces-J and Crimson
Date Tue, 27 Feb 2001 02:50:31 GMT
Craig,   Very interesting.  I was certain that we needed an EntityResolver
but that it didn't work properly.  Thanks.   

-----Original Message-----
From: David Kellum [mailto:dekellum@aol.com]
Sent: Friday, February 23, 2001 11:35 AM
To: general@xml.apache.org
Subject: Differences in DTD handling between Xerces-J and Crimson


I'm writing a performance minded server in Java that needs to repeatedly
parse relatively small (5k) XML documents obtained from a remote
server.  I don't need or want to have the overhead of any validation in
this parse.  However, the returned document includes a doctype
declaration like so:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<!DOCTYPE FOO SYSTEM "foo.dtd">
<FOO>
...
</FOO>

I don't control this format so I can't get rid of the DOCTYPE
declaration without performing some error-prone hack like stripping the
line out before passing it to the parser. 

With Xerces-J I can work around this by using the SAX-2 interface's
XMLReader.setEntityResolver() to an instance of the following:

public class NullEntityResolver 
    implements EntityResolver 
{
    public InputSource resolveEntity( String publicId, String systemId )
    {
       return new InputSource( new ByteArrayInputStream( new byte[0] )
);
    }
}

However, I can't seem to do the same with the Crimson 1.1 parser.  Here
I get the following:

Exception in thread "main" org.xml.sax.SAXParseException: Relative URI
"foo.dtd"; can not be resolved without a document URI.
    at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3035)
    at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3029)
    at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2627)
    at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605)
at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116)
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:304)
    at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)
    at TestParse.main(TestParse.java:39)


Why do I want to use Crimson you ask?  For this document type I'm seeing
that it is performing twice as fast as Xerces J 1.3. (After the Java 1.3
-server VM has time to warm up.)

Any suggestions on how I might work around this with Crimson?  Any
comments on the validity of my approach for dealing with this in
Xerces-J?

Thanks,
David

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

Mime
View raw message