lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Koch" <TheRan...@gmx.net>
Subject Re: which HTML parser is better?
Date Fri, 04 Feb 2005 10:21:54 GMT
The link does not work.

> 
> One which we've been using can be found at:
> http://www.ltg.ed.ac.uk/~richard/ftp-area/html-parser/
> 
> We absolutely need to be able to recover gracefully from malformed
> HTML and/or SGML.  Most of the nicer SAX/DOM/TLA parsers out there
> failed this criterion when we started our effort.  The above one is
> kind of SAX-y but doesn't fall over at the sight of a real web page
> ;-)
> 
> Ian
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
DSL Komplett von GMX +++ SupergŁnstig und stressfrei einsteigen!
AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message