lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: removing a term from a lucene index
Date Wed, 13 Sep 2006 20:28:52 GMT
Chris Hostetter wrote:
> : > undesired words as a sort of stoplist.  But surely there's a better way
> : > to do it (the inverted index structure seems like this should be
> : > natural).  Any pointers would be most helpful.
>
> I've never given this much thought, but i know that merging indexes can be
> done with IndexReaders, and I know i've seen a FilterIndexReader class
> which lets you proxy to another IndexReader but "filter out" some data by
> overriding methods and using FilterTermDocs, FilterTermEnum and
> FilterTermPositions ... would it be possibe to subclass FilterIndexReader
> with an instance that checked the docFreq of each term and ignored it if
> it was less then N ... then just merge your old index into a new (epty)
> index using this FilteredIndexReader...
>
> 	?
>   

http://svn.apache.org/repos/asf/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSorter.java

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message