xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike O'Leary" <tm-ole...@comcast.net>
Subject Ignoring missing end tag errors
Date Wed, 18 Apr 2007 08:05:55 GMT
I wrote an XML parser using the SAXParser. It turns out that the XML files I
need to parse are somewhat noisy, and there are cases where there is no end
tag for a given start tag. I would like to catch these errors immediately
and proceed as if the end tag was read (in the cases I have looked at, the
missing end tag causes no ambiguity), but I don't see how to do that. The
documentation for the DefaultHandler class, which I am using to define
handler functions, says that it supports the functions error, fatalError and
warning, but when my parser hits a place where an end tag is missing, the
underlying parser code throws an exception instead of calling any of these
functions, and I don't see how to catch that exception in a way that would
allow the parser can continue reading the xml file. The error message and
call stack look like this:

 

org.xml.sax.SAXParseException: The element type "P" must be terminated by
the matching end-tag "</P>".

        at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseEx
ception(ErrorHandlerWrapper.java:236)

        at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Error
HandlerWrapper.java:215)

        at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErro
rReporter.java:386)

        at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErro
rReporter.java:316)

        at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScann
er.java:1438)

        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanE
ndElement(XMLDocumentFragmentScannerImpl.java:1219)

        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$Fragm
entContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1685)

        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanD
ocument(XMLDocumentFragmentScannerImpl.java:368)

        at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Con
figuration.java:834)

        at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Con
figuration.java:764)

        at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:14
8)

Is there a way to define an error handler that ignores certain kinds of
errors and to have it be used instead of, say, the ErrorHandlerWrapper in
this call stack whose fatalError function creates and throws an exception in
all cases? Is it reasonable to want to do this, given that the parser
considers an error of this kind to be fatal?

 


Mime
View raw message