Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 35828 invoked from network); 20 Jun 2005 16:33:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 20 Jun 2005 16:33:41 -0000 Received: (qmail 4429 invoked by uid 500); 20 Jun 2005 16:33:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 4359 invoked by uid 500); 20 Jun 2005 16:33:34 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 4333 invoked by uid 99); 20 Jun 2005 16:33:34 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from ehatchersolutions.com (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jun 2005 09:33:33 -0700 Received: by ehatchersolutions.com (Postfix, from userid 504) id 4283E13E2007; Mon, 20 Jun 2005 12:33:11 -0400 (EDT) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by ehatchersolutions.com (Postfix) with ESMTP id 7431E13E2005 for ; Mon, 20 Jun 2005 12:32:05 -0400 (EDT) In-Reply-To: <62184227.20050620175405@bounce-software.com> References: <1779000970.20050619121713@bounce-software.com> <4B838408-68D6-4675-8BAD-5AE64018D490@ehatchersolutions.com> <718481582.20050620163827@bounce-software.com> <94338630-E81B-4FC7-A3A6-34BAFE5A9F68@ehatchersolutions.com> <62184227.20050620175405@bounce-software.com> Mime-Version: 1.0 (Apple Message framework v730) X-Priority: 3 (Normal) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Re[4]: md5 keyword field issue Date: Mon, 20 Jun 2005 12:32:04 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.730) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Level: X-Old-Spam-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Jun 20, 2005, at 10:54 AM, catalin-lucene@dazoot.ro wrote: > Monday, June 20, 2005, 5:48:30 PM, Erik Hatcher wrote: > >> Now you've just said the same conflicting thing a different way. You >> want to cluster but only return one. :) >> > > i think i missunderstood here the Term: cluster. > so yes, i just want one image returned. Maybe my interpretation of "cluster" is clouded by the search domain. In the search domain, cluster means grouping multiple things. >> If you only want one image returned, then it seems that only indexing >> the same image once is the way to go. When you find a duplicate MD5, >> don't index that as a second document. You will, instead, update the >> document by adding additional ALT text and perhaps the additional >> URL. >> > > this sounds pretty ok ! The tricks are to do a search when indexing to find duplicates, and to "update" the document by deleting and re-adding it (you'll probably want to store the field data so you can retrieve it easily and use it for the new updated document. The negative to this approach is you want know specifically which page the image was on in results, though you could keep all URL's that point to it as a document can have multiple fields named "URL" for example. >>> in sql this would be: >>> select distinct md5, url, alt from table group by md5 order by >>> score asc; >>> > > >> This would give you multiple records for the same MD5. You said >> above you only want one per MD5. >> > > here i'm afraid you are not correct, because i have GROUP BY MD5 > clause which will return no duplicates. Sorry, I missed the GROUP BY clause there in my first human parse of the expression - I was too busy focusing on DISTINCT. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org