lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shrinath.m" <shrinat...@webyog.com>
Subject Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?
Date Tue, 15 Mar 2011 04:46:53 GMT
I started trying out all your suggestions one by one, thanks to all who
helped.

I used Jericho and found it extremely simple to start with ...

Just wanted to clarify one thing though.
Is there some tool that does extract text from HTML without creating the DOM
?


-- 
Regards
Shrinath.M


--
View this message in context: http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2680634.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message