Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 88134 invoked from network); 1 May 2004 02:03:54 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 1 May 2004 02:03:54 -0000 Received: (qmail 84139 invoked by uid 500); 1 May 2004 02:03:32 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 83913 invoked by uid 500); 1 May 2004 02:03:31 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 83900 invoked from network); 1 May 2004 02:03:30 -0000 Received: from unknown (HELO web41112.mail.yahoo.com) (66.218.93.28) by daedalus.apache.org with SMTP; 1 May 2004 02:03:30 -0000 Message-ID: <20040501020340.58035.qmail@web41112.mail.yahoo.com> Received: from [24.30.185.17] by web41112.mail.yahoo.com via HTTP; Fri, 30 Apr 2004 19:03:40 PDT Date: Fri, 30 Apr 2004 19:03:40 -0700 (PDT) From: James Dunn Subject: Re: Preventing duplicate document insertion during optimize To: Lucene Users List In-Reply-To: <4092E587.10704@newsmonster.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Kevin, I have a similar issue. The only solution I have been able to come up with is, after the merge, to open an IndexReader against the merge index, iterate over all the docs and delete duplicate docs based on my "primary key" field. Jim --- "Kevin A. Burton" wrote: > Let's say you have two indexes each with the same > document literal. All > the fields hash the same and the document is a > binary duplicate of a > different document in the second index. > > What happens when you do a merge to create a 3rd > index from the first > two? I assume you now have two documents that are > identical in one > index. Is there any way to prevent this? > > It would be nice to figure out if there's a way to > flag a field as a > primary key so that if it has already added it to > just skip. > > Kevin > > -- > > Please reply using PGP. > > http://peerfear.org/pubkey.asc > > NewsMonster - http://www.newsmonster.org/ > > Kevin A. Burton, Location - San Francisco, CA, Cell > - 415.595.9965 > AIM/YIM - sfburtonator, Web - > http://peerfear.org/ > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D > 8D04 99F1 4412 > IRC - freenode.net #infoanarchy | #p2p-hackers | > #newsmonster > > > ATTACHMENT part 2 application/pgp-signature name=signature.asc __________________________________ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org