From msmolen@armature.com Tue Feb 27 02:55:30 2001 Return-Path: Mailing-List: contact general-help@xml.apache.org; run by ezmlm Delivered-To: mailing list general@xml.apache.org Delivered-To: moderator for general@xml.apache.org Received: (qmail 97904 invoked from network); 27 Feb 2001 02:55:30 -0000 Received: from news.armature.com (HELO cow.armature.com) (194.128.255.228) by h31.sny.collab.net with SMTP; 27 Feb 2001 02:55:30 -0000 Received: by COW with Internet Mail Service (5.5.2653.19) id ; Tue, 27 Feb 2001 02:47:35 -0000 Message-ID: <9C2A9F9565A7D211801C00A0C9CFDF84079A7A@rainier> From: Mark Smolen To: "'general@xml.apache.org'" Subject: RE: Differences in DTD handling between Xerces-J and Crimson Date: Tue, 27 Feb 2001 02:50:31 -0000 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N Craig, Very interesting. I was certain that we needed an EntityResolver but that it didn't work properly. Thanks. -----Original Message----- From: David Kellum [mailto:dekellum@aol.com] Sent: Friday, February 23, 2001 11:35 AM To: general@xml.apache.org Subject: Differences in DTD handling between Xerces-J and Crimson I'm writing a performance minded server in Java that needs to repeatedly parse relatively small (5k) XML documents obtained from a remote server. I don't need or want to have the overhead of any validation in this parse. However, the returned document includes a doctype declaration like so: ... I don't control this format so I can't get rid of the DOCTYPE declaration without performing some error-prone hack like stripping the line out before passing it to the parser. With Xerces-J I can work around this by using the SAX-2 interface's XMLReader.setEntityResolver() to an instance of the following: public class NullEntityResolver implements EntityResolver { public InputSource resolveEntity( String publicId, String systemId ) { return new InputSource( new ByteArrayInputStream( new byte[0] ) ); } } However, I can't seem to do the same with the Crimson 1.1 parser. Here I get the following: Exception in thread "main" org.xml.sax.SAXParseException: Relative URI "foo.dtd"; can not be resolved without a document URI. at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3035) at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3029) at org.apache.crimson.parser.Parser2.parseSystemId(Parser2.java:2627) at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2605) at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1116) at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:488) at org.apache.crimson.parser.Parser2.parse(Parser2.java:304) at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433) at TestParse.main(TestParse.java:39) Why do I want to use Crimson you ask? For this document type I'm seeing that it is performing twice as fast as Xerces J 1.3. (After the Java 1.3 -server VM has time to warm up.) Any suggestions on how I might work around this with Crimson? Any comments on the validity of my approach for dealing with this in Xerces-J? Thanks, David --------------------------------------------------------------------- In case of troubles, e-mail: webmaster@xml.apache.org To unsubscribe, e-mail: general-unsubscribe@xml.apache.org For additional commands, e-mail: general-help@xml.apache.org