Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 15055 invoked from network); 11 Nov 2005 00:32:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Nov 2005 00:32:09 -0000 Received: (qmail 72240 invoked by uid 500); 11 Nov 2005 00:32:04 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72219 invoked by uid 500); 11 Nov 2005 00:32:04 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72207 invoked by uid 99); 11 Nov 2005 00:32:04 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Nov 2005 16:32:04 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [66.51.199.81] (HELO mail5.dslextreme.com) (66.51.199.81) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 10 Nov 2005 16:31:57 -0800 Received: (qmail 21609 invoked from network); 11 Nov 2005 00:31:30 -0000 Received: from unknown (HELO x1000.msqr.us) (66.245.216.224) by mail5.dslextreme.com with (EDH-RSA-DES-CBC3-SHA encrypted) SMTP; Thu, 10 Nov 2005 16:31:30 -0800 Received: from x1000.msqr.us (x1000.msqr.us [127.0.0.1]) by x1000.msqr.us (8.13.1/8.13.1) with ESMTP id jAB0U9jJ011471 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 10 Nov 2005 16:30:09 -0800 Received: (from apache@localhost) by x1000.msqr.us (8.13.1/8.13.1/Submit) id jAB0U9Gi011470; Thu, 10 Nov 2005 16:30:09 -0800 Received: from 222.152.67.193 (SquirrelMail authenticated user matt); by msqr.us with HTTP; Thu, 10 Nov 2005 16:30:08 -0800 (PST) Message-ID: <49658.222.152.67.193.1131669008.squirrel@msqr.us> In-Reply-To: References: <49742.222.152.67.193.1131580094.squirrel@msqr.us> Date: Thu, 10 Nov 2005 16:30:08 -0800 (PST) Subject: Re: efficiently finding all terms used on a particular field withinDocuments matching a query From: "Matt Magoffin" To: java-user@lucene.apache.org User-Agent: SquirrelMail/1.4.3a-12.EL4 X-Mailer: SquirrelMail/1.4.3a-12.EL4 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N > : I'm wondering if there a more efficient way to accomplish this? > > I believe there is -- provided the terms are index. > > 1) Get yourself a BitSet representing the Documents you are interested in > (you mentioned having a a date range, you can either use a RangeFilter nad > call the bits method directly, or you can do a search using a > HitCollector) > > 2) Look at the code that acctually makes RangeFilter work. It iterates > over a TermEnum between a low and high value. for each term it finds, it > uses a TermDocs to record the docid. You could do something very similar, > looping over all terms in the field you want. but instead of recording > the docid, add the term to your Set object -- if and only if one of the > docids from the TermDocs is in your BitSet from step #1 above. > > > ...that should be faster then the appraoch you have now. Thank you, I had thought a BitSet was appropriate here somehow, I'll work on this approach. -- m@ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org