lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From starz10de <farag_ah...@yahoo.com>
Subject Index html sites using IndexHtml
Date Sun, 26 Jul 2009 11:24:26 GMT

Hi,

I am indexing a set of html websites using lucene (IndexHtml). The indexer
work fine and I can also find the indexed term but the problem this class
(IndexHtml) index all text inside the html site even the advertisements. I
am interested just in the body text and not interested in the advertisements
or side links text.

Any help how to solve this problem? Did I use the class wrongly?



-- 
View this message in context: http://www.nabble.com/Index-html-sites-using-IndexHtml-tp24666110p24666110.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message