Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60073 invoked from network); 11 Oct 2008 10:37:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Oct 2008 10:37:06 -0000 Received: (qmail 95660 invoked by uid 500); 11 Oct 2008 10:36:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95587 invoked by uid 500); 11 Oct 2008 10:36:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95576 invoked by uid 99); 11 Oct 2008 10:36:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Oct 2008 03:36:57 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [62.73.241.54] (HELO pebbles.fastcom.no) (62.73.241.54) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Oct 2008 10:35:52 +0000 Received: from melkor (integrasco.bzzware.org [213.236.150.62]) by pebbles.fastcom.no (Postfix) with ESMTP id 61D218E41 for ; Sat, 11 Oct 2008 12:35:50 +0200 (DFT) Date: Sat, 11 Oct 2008 12:36:04 +0200 To: "java-user@lucene.apache.org" Subject: Retrieving Top Terms for a subset of the index (or for all results of a query) From: "Aleksander M. Stensby" Organization: Integrasco A/S Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Message-ID: User-Agent: Opera Mail/9.60 (Linux) X-Virus-Checked: Checked by ClamAV on apache.org Hello everyone. I've been fiddeling with the idea of retrieving the top terms from a subset of the index (i.e. top terms from the documents retrieved by a given search). This could for instance be useful to identify top ranking terms in a given datespan etc. It would be something like getting the top 50 terms (like you can do with luke) but instead of doing it for the full index, I would like to do the same procedure after applying a filter or a query. Don't know if this is a bad explaination or wheter it makes any sense at all... So, I really want to avoid iterating over all results (obviously), so my question is really if there is a prefered approach for doing such analysis / has this been done in a good way before? Thanks for any help! Best regards, Aleksander -- Aleksander M. Stensby Senior Software Developer Integrasco A/S +47 41 22 82 72 aleksander.stensby@integrasco.no --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org