lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nestel, Frank IZ/HZA-IC4" <neste...@de.ina.com>
Subject Re: Best HTML Parser !!
Date Wed, 26 Feb 2003 09:00:10 GMT
I've had fairly good experience with Jtidy!

But HTMLParser http://htmlparser.sourceforge.net/
seems to have the lighter looking API. It is Event
based and I might need to parse some large HTML sometime
soon, where DOM might be the problem. Does anyone
have practical experience with HTMLParser?

Thanks
Frank

> -----Urspr√ľngliche Nachricht-----
> Von: petite_abeille [mailto:petite_abeille@mac.com] 
> Gesendet: Dienstag, 25. Februar 2003 19:49
> An: Lucene Users List
> Betreff: Re: Best HTML Parser !!
> 
> 
> 
> On Monday, Feb 24, 2003, at 20:28 Europe/Zurich, Lukas Zapletal wrote:
> 
> > I have some good experiences with JTidy. It works like 
> DOM-XML parser
> > and cleans HTML it by the way.
> 
> I use jtidy also. Both for parsing and clean-up. Works pretty nicely.
> 
> > This is VERY useful, because EVERY HTML have at least ONE error.
> 
> This rule should be tattooed on every parsers head: out of the 
> laboratory, nothing is compliant. Which render the race to "more 
> compliance" among the different parsers somewhat ridiculous.
> 
> Cheers,
> 
> PA.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message