lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From petite_abeille <>
Subject Re: Best HTML Parser !!
Date Tue, 25 Feb 2003 18:48:35 GMT

On Monday, Feb 24, 2003, at 20:28 Europe/Zurich, Lukas Zapletal wrote:

> I have some good experiences with JTidy. It works like DOM-XML parser 
> and cleans HTML it by the way.

I use jtidy also. Both for parsing and clean-up. Works pretty nicely.

> This is VERY useful, because EVERY HTML have at least ONE error.

This rule should be tattooed on every parsers head: out of the 
laboratory, nothing is compliant. Which render the race to "more 
compliance" among the different parsers somewhat ridiculous.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message