Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 64666 invoked from network); 17 Mar 2008 00:30:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Mar 2008 00:30:06 -0000 Received: (qmail 88118 invoked by uid 500); 17 Mar 2008 00:29:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88084 invoked by uid 500); 17 Mar 2008 00:29:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88073 invoked by uid 99); 17 Mar 2008 00:29:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Mar 2008 17:29:57 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [203.217.22.128] (HELO file1.syd.nuix.com.au) (203.217.22.128) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2008 00:29:20 +0000 Received: from host68.syd.nuix.com.au (host68.syd.nuix.com.au [192.168.222.68]) by file1.syd.nuix.com.au (Postfix) with ESMTP id 6D87D4A8145 for ; Mon, 17 Mar 2008 11:29:23 +1100 (EST) From: Daniel Noll Organization: Nuix Pty Ltd To: java-user@lucene.apache.org Subject: Re: Document ID shuffling under 2.3.x (on merge?) Date: Mon, 17 Mar 2008 11:24:42 +1100 User-Agent: KMail/1.9.6 (enterprise 0.20070907.709405) References: <200803111645.27956.daniel@nuix.com> <200803131042.50640.daniel@nuix.com> <59BEA59D-D139-441E-8C34-A32C978FA848@mikemccandless.com> In-Reply-To: <59BEA59D-D139-441E-8C34-A32C978FA848@mikemccandless.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803171124.43064.daniel@nuix.com> X-Virus-Checked: Checked by ClamAV on apache.org On Thursday 13 March 2008 19:46:20 Michael McCandless wrote: > But, when a normal merge of segments with deletions completes, your > docIDs will shift. In trunk we now explicitly compute the docID > shifting that happens after a merge, because we don't always flush > pending deletes when flushing added docs, but this is all done > privately to IndexWriter. I don't need to worry about deleted documents as such things don't exist in our system, hence the optimisation based on document IDs. > I'm a little confused: you said optimize() introduces the problem, > but, it sounds like optimize() should be fixing the problem because > it compacts all docIDs to match what you were "guessing" outside of > Lucene? Can you post the full stack trace of the exceptions you're > hitting? You're misunderstanding how we're getting the ID, that's all. We're getting it by calling docCount() (after adding) and subtracting 1, which is guaranteed to give the right ID at the time of indexing, although of course, later is another matter entirely. We were operating from now out of date information which says the IDs don't shift unless you call delete... Example: add document, assume ID 0 (docCount = 1) add document, assume ID 1 (docCount = 2) add document, FAILS - assumed not added re-add document minus reader fields, assume ID 3 (docCount = 4) So the ID assumptions are correct at this point; when optimize() is called, it shifts the third document sucht that it then has ID 2, and our internal counts become wrong. I've backported the expungeDeleted() patch into 2.3 and will be testing it out next; seems it will do more or less what we want and merging the deleted document should be relatively quick as it will only ever exist in the in DocumentsWriter's in-memory buffer. Daniel --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org