Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 37088 invoked from network); 27 Jul 2007 02:55:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Jul 2007 02:55:59 -0000 Received: (qmail 19495 invoked by uid 500); 27 Jul 2007 02:55:54 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19412 invoked by uid 500); 27 Jul 2007 02:55:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19400 invoked by uid 99); 27 Jul 2007 02:55:54 -0000 Received: from Unknown (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2007 19:55:54 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [203.217.22.128] (HELO file1.syd.nuix.com.au) (203.217.22.128) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jul 2007 02:55:46 +0000 Received: from host68.syd.nuix.com.au (host68.syd.nuix.com.au [192.168.222.68]) by file1.syd.nuix.com.au (Postfix) with ESMTP id BF4084A81A9 for ; Fri, 27 Jul 2007 12:52:14 +1000 (EST) From: Daniel Noll To: java-user@lucene.apache.org Subject: Re: Lucene equivalent of SQL DISTINCT for a specific field's "stored values" Date: Fri, 27 Jul 2007 12:55:00 +1000 User-Agent: KMail/1.9.7 References: <11822265.post@talk.nabble.com> In-Reply-To: <11822265.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200707271255.01042.daniel@nuix.com> X-Virus-Checked: Checked by ClamAV on apache.org On Friday 27 July 2007 12:50:12 TimF wrote: > However, obviously this returns the list of distinct terms, > Hello , World , Goodbye , Foo , Bar , Mad > > not the list of distinct stored values, > Hello World , Goodbye World , Foo Bar , Mad Mad Mad Mad World > > I could add another field to the index that is not tokenized and then > enumerate the terms for that new field, but this seems like a hack, and it > would also add size to the index in that I would be duplicating data for > the category for each document. That is certainly how I would do it. This sort of thing only works with untokenised fields, unless you have somewhere else you can store the untokenised version which is quicker to iterate over. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699 Web: http://nuix.com/ Fax: +61 2 9212 6902 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org