lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject HTML parser
Date Fri, 19 Apr 2002 05:28:44 GMT
Hello,

I need to select an HTML parser for the application that I'm writing
and I'm not sure what to choose.
The HTML parser included with Lucene looks flimsy, JTidy looks like a
hack and an overkill, using classes written for Swing
(javax.swing.text.html.parser) seems wrong, and I haven't tried David
McNicol's parser (included with Spindle).

Somebody on this list must have done some research on this subject.
Can anyone share some experiences?
Have you found a better HTML parser than any of those I listed above?
If your application deals with HTML, what do you use for parsing it?

Thanks,
Otis


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message