lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: easy way to figure out most common tokens?
Date Wed, 15 Aug 2012 18:48:52 GMT
You cannot modify the ternm dictionary of an index, see my other eMail. You
have to filter it by copying to a new index or reindexing. Document
modifications are not supported in Lucene and other inverted indexes.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Shaya Potter [mailto:spotter@gmail.com]
> Sent: Wednesday, August 15, 2012 8:44 PM
> To: java-user@lucene.apache.org
> Subject: Re: easy way to figure out most common tokens?
> 
> On 08/15/2012 02:34 PM, Ahmet Arslan wrote:
> >> Is there an easy way to figure out
> >> the most common tokens and then remove those tokens from the
> >> documents.
> >
> > Probably this :
> > http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/Hig
> > hFreqTerms.html
> 
> ah, that's a good part 1.  Then the Q would then be, how to modify the
index
> without reindexing all documents.
> 
> my gut is that it should be possible (it seems luke does it), but never
went deep
> into the document object besides for adding fields.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message