Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7892AD91F for ; Wed, 15 Aug 2012 18:49:22 +0000 (UTC) Received: (qmail 87064 invoked by uid 500); 15 Aug 2012 18:49:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 86964 invoked by uid 500); 15 Aug 2012 18:49:20 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 86956 invoked by uid 99); 15 Aug 2012 18:49:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Aug 2012 18:49:20 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Aug 2012 18:49:14 +0000 Received: from VEGA (port-92-196-51-162.dynamic.qsc.de [92.196.51.162]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id 0E44A14AA06F for ; Wed, 15 Aug 2012 18:48:53 +0000 (UTC) From: "Uwe Schindler" To: References: <1345055685.58007.YahooMailClassic@web121702.mail.ne1.yahoo.com> <502BEE0C.9040301@gmail.com> In-Reply-To: <502BEE0C.9040301@gmail.com> Subject: RE: easy way to figure out most common tokens? Date: Wed, 15 Aug 2012 20:48:52 +0200 Message-ID: <002b01cd7b16$9d71cbb0$d8556310$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQINmg/WmzN5RjhHhInuN4WPWyZYXgF4y/EOls8ygXA= Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org You cannot modify the ternm dictionary of an index, see my other eMail. You have to filter it by copying to a new index or reindexing. Document modifications are not supported in Lucene and other inverted indexes. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Shaya Potter [mailto:spotter@gmail.com] > Sent: Wednesday, August 15, 2012 8:44 PM > To: java-user@lucene.apache.org > Subject: Re: easy way to figure out most common tokens? > > On 08/15/2012 02:34 PM, Ahmet Arslan wrote: > >> Is there an easy way to figure out > >> the most common tokens and then remove those tokens from the > >> documents. > > > > Probably this : > > http://lucene.apache.org/core/3_6_1/api/all/org/apache/lucene/misc/Hig > > hFreqTerms.html > > ah, that's a good part 1. Then the Q would then be, how to modify the index > without reindexing all documents. > > my gut is that it should be possible (it seems luke does it), but never went deep > into the document object besides for adding fields. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org