lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaya Potter <spot...@gmail.com>
Subject Re: easy way to figure out most common tokens?
Date Wed, 15 Aug 2012 18:44:28 GMT
On 08/15/2012 02:34 PM, Ahmet Arslan wrote:
>> Is there an easy way to figure out
>> the most common tokens and then remove those tokens from the
>> documents.
>
> Probably this : http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/HighFreqTerms.html

ah, that's a good part 1.  Then the Q would then be, how to modify the 
index without reindexing all documents.

my gut is that it should be possible (it seems luke does it), but never 
went deep into the document object besides for adding fields.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message