lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Soboroff <>
Subject Re: which HTML parser is better?
Date Fri, 04 Feb 2005 15:36:56 GMT

Oops.  It's in the Google cache and also the Internet Archive Wayback
machine.  I'll drop the original author a note to let him know that
his links are stale.


"Karl Koch" <> writes:

> The link does not work.
>> One which we've been using can be found at:
>> We absolutely need to be able to recover gracefully from malformed
>> HTML and/or SGML.  Most of the nicer SAX/DOM/TLA parsers out there
>> failed this criterion when we started our effort.  The above one is
>> kind of SAX-y but doesn't fall over at the sight of a real web page
>> ;-)

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message