Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 73429 invoked from network); 14 Oct 2008 19:23:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Oct 2008 19:23:39 -0000 Received: (qmail 19155 invoked by uid 500); 14 Oct 2008 19:23:29 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19126 invoked by uid 500); 14 Oct 2008 19:23:29 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19107 invoked by uid 99); 14 Oct 2008 19:23:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Oct 2008 12:23:28 -0700 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.105.6.39] (HELO sandstone.cs.wisc.edu) (128.105.6.39) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Oct 2008 19:22:21 +0000 Received: from nazm.cs.wisc.edu (nazm.cs.wisc.edu [128.105.165.141]) by sandstone.cs.wisc.edu (8.14.1/8.14.1) with ESMTP id m9EJJndC024318 for ; Tue, 14 Oct 2008 14:19:49 -0500 Message-ID: <48F4F0D5.5010308@cs.wisc.edu> Date: Tue, 14 Oct 2008 14:19:49 -0500 From: Akanksha Baid User-Agent: Thunderbird 2.0.0.16 (X11/20080707) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: distinct field values References: <48F448E5.70902@cs.wisc.edu> <867513fe0810140054l6be2bcdcoe1592cd2864816f2@mail.gmail.com> In-Reply-To: <867513fe0810140054l6be2bcdcoe1592cd2864816f2@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Is there something I could do to Index the documents differently to accomplish this? Currently I am looking at all the hits to generate the set of tags for the query. If I need to implement the same thing within Lucene, I am not sure if I will gain anything performance wise. Or am I wrong about this? Anshum wrote: > Hi, > > You could try changing (or extending) TopFieldDocCollector and do your > processing there (that is what I tried... and it worked fine). But that > would mean changing lucene code a little bit. > > -- > Anshum Gupta > Naukri Labs! > http://ai-cafe.blogspot.com > > The facts expressed here belong to everybody, the opinions to me. The > distinction is yours to draw............ > > > On Tue, Oct 14, 2008 at 12:53 PM, Akanksha Baid wrote: > >> I have indexed multiple documents - each of them have 3 fields ( id, tag , >> text). Is there an easy way to determine the set of tags for a given query >> without iterating through all the hits? >> For example if I have 100 documents in my index and my set of tag = {A, B, >> C}. Query Q on the text field returns 15 docs with tag A , 10 with tag B and >> none with tag C (total of 25 hits). Is there a way to determine that the set >> of tags for query Q = {A, B} without iterating through all 25 hits. >> >> Any ideas? >> >> Thanks! >> Akanksha >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org