lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Soboroff <>
Subject Re: which HTML parser is better?
Date Thu, 03 Feb 2005 20:32:06 GMT

One which we've been using can be found at:

We absolutely need to be able to recover gracefully from malformed
HTML and/or SGML.  Most of the nicer SAX/DOM/TLA parsers out there
failed this criterion when we started our effort.  The above one is
kind of SAX-y but doesn't fall over at the sight of a real web page


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message