Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 17107 invoked from network); 12 Oct 2008 15:40:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Oct 2008 15:40:01 -0000 Received: (qmail 45300 invoked by uid 500); 12 Oct 2008 15:39:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 45257 invoked by uid 500); 12 Oct 2008 15:39:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 45246 invoked by uid 99); 12 Oct 2008 15:39:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Oct 2008 08:39:55 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [208.97.132.74] (HELO spunkymail-a9.g.dreamhost.com) (208.97.132.74) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Oct 2008 15:38:48 +0000 Received: from [192.168.0.3] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a9.g.dreamhost.com (Postfix) with ESMTP id 267171FC8B for ; Sun, 12 Oct 2008 08:38:52 -0700 (PDT) Message-Id: <59BE1D2A-BC25-4A86-86C5-560B99D28DF7@apache.org> From: Grant Ingersoll To: java-user@lucene.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: Retrieving Top Terms for a subset of the index (or for all results of a query) Date: Sun, 12 Oct 2008 11:38:51 -0400 References: X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org How large of a subset are you talking? You might look at the FitleredTermEnum class, but you will probably have to do some work on it to extend it to what you want If you are talking a smallish subset (say, at most a couple hundred docs), then you could store Term Vectors and use the TermVectorMapper, I suspect. HTH, Grant On Oct 11, 2008, at 6:36 AM, Aleksander M. Stensby wrote: > Hello everyone. I've been fiddeling with the idea of retrieving the > top terms from a subset of the index (i.e. top terms from the > documents retrieved by a given search). This could for instance be > useful to identify top ranking terms in a given datespan etc. > > It would be something like getting the top 50 terms (like you can do > with luke) but instead of doing it for the full index, I would like > to do the same procedure after applying a filter or a query. Don't > know if this is a bad explaination or wheter it makes any sense at > all... > > So, I really want to avoid iterating over all results (obviously), > so my question is really if there is a prefered approach for doing > such analysis / has this been done in a good way before? > > Thanks for any help! > > Best regards, > Aleksander > > -- > Aleksander M. Stensby > Senior Software Developer > Integrasco A/S > +47 41 22 82 72 > aleksander.stensby@integrasco.no > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > -------------------------- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org