Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 494 invoked from network); 15 Dec 2006 05:51:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Dec 2006 05:51:02 -0000 Received: (qmail 79159 invoked by uid 500); 15 Dec 2006 05:51:04 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79116 invoked by uid 500); 15 Dec 2006 05:51:04 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 79105 invoked by uid 99); 15 Dec 2006 05:51:04 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Dec 2006 21:51:04 -0800 X-ASF-Spam-Status: No, hits=4.8 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [66.196.101.249] (HELO unknown-66-196-101-249.yahoo.com) (66.196.101.249) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 14 Dec 2006 21:50:52 -0800 Received: (qmail 27730 invoked from network); 14 Dec 2006 23:17:53 -0000 Received: from web58611.mail.re3.yahoo.com (68.142.236.209) by rrr1-v1.mail.re1.yahoo.com with SMTP; 14 Dec 2006 23:17:53 -0000 Received: (qmail 87926 invoked by uid 60001); 14 Dec 2006 23:17:53 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:MIME-Version:Content-Type; b=NJnU2M4hpy6JtneCAJoFbqD1yeq8lGA6mKu+4NAQBZF2rynTEVYC+OBDZPr69sLcD11VRPJc0HlJPgI3C2+4jT/z1wm0sTsky6JnJebuVFG/EdeIWUxVPjYD8DTIej8zmJuuhCYpX+kvmBUo/gAebNwdoSVIAOOGhAx1CGozfEE= ; Message-ID: <20061214231753.87924.qmail@web58611.mail.re3.yahoo.com> Received: from [70.19.178.76] by web58611.mail.re3.yahoo.com via HTTP; Thu, 14 Dec 2006 15:17:53 PST Date: Thu, 14 Dec 2006 15:17:53 -0800 (PST) From: qaz zaq Subject: Re: Duplicates removal in search results To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-546038120-1166138273=:86958" X-Virus-Checked: Checked by ClamAV on apache.org --0-546038120-1166138273=:86958 Content-Type: text/plain; charset=ascii Content-Transfer-Encoding: quoted-printable Thanks Erick,=0AUsing termdocs/termenum should work. One of my concerns is = the performance: the search results could reach 100K, so the performance ma= y be impacted. One of the alternative I am thinking is to collapse the da= ta during indexing time, but I haven't decided to go that way.=0A=0A----- O= riginal Message ----=0AFrom: Erick Erickson =0ATo:= java-user@lucene.apache.org=0ASent: Thursday, December 14, 2006 5:49:01 PM= =0ASubject: Re: Duplicates removal in search results=0A=0A=0Ayou need to se= arch for all documents with the title you care about, decide=0Awhich one to= keep and remove all the others.=0A=0AYou'll probably need a TermDocs/TermE= num to go through all the items in your=0Aindex to create the list of docum= ents to remove.=0A=0AErick=0A=0AOn 12/14/06, qaz zaq w= rote:=0A>=0A> How can i remove the duplicates records in the search results= . i.e., I=0A> have multiple results with the same title in 'title' field, a= nd I want to=0A> only 1 record per title, how can I achieve that? thanks!!= =0A>=0A>=0A> ---------------------------------=0A> Everyone is raving about= the all-new Yahoo! Mail beta.=0A>=0A=0A=0A =0A____________________________= ________________________________________________________=0AAny questions? G= et answers on any topic at www.Answers.yahoo.com. Try it now. --0-546038120-1166138273=:86958--