Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F43210105 for ; Mon, 13 Jan 2014 23:45:27 +0000 (UTC) Received: (qmail 44016 invoked by uid 500); 13 Jan 2014 23:38:22 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 43971 invoked by uid 500); 13 Jan 2014 23:38:20 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 43874 invoked by uid 99); 13 Jan 2014 23:38:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jan 2014 23:38:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.219.50] (HELO mail-oa0-f50.google.com) (209.85.219.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jan 2014 23:38:01 +0000 Received: by mail-oa0-f50.google.com with SMTP id l6so8778152oag.23 for ; Mon, 13 Jan 2014 15:37:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=tVP9iBmmzAPW0E5sEWk1HI+xgSwtbTJSJfJHFx7k1sY=; b=JMhzaL39K5ZUgB+KAOXhebH0p/+dPkV/zRee5eykPKvEC51alv7aJdkcWj+HHeQ/Ha VtC7GsA2dauL/QEcBC4F0HZrMmXTpuWyzwl6r8vnvvmcxz8I09ENM7k45AP225xUu10t gmRAZloEMMNrwtGzHPz4I/EetmUsyCnuu5u2iK6PY9PKCz56LH+MDG21emetIj5Rknf0 RyAvVDjTRs+WcfMDKq2lNRh7A5XondOfxEfDTPanSacQjPf6j8h3tHI1WBkkuCEGSvxP j7/T1LQDrkcXqumAFx6DfZTrt9bNqJRBWgbMnKum1LlsuqasSVHdKcdcVRM8MZBE4bc8 gEDw== X-Gm-Message-State: ALoCoQlTqxNWJDoheyllPG7tEPTdbbfH/QvbPAhe9d8g3lN2LfCaO3oQRBtZIyRTGcJVQlMFSv/Y MIME-Version: 1.0 X-Received: by 10.60.174.167 with SMTP id bt7mr4248857oec.54.1389656259942; Mon, 13 Jan 2014 15:37:39 -0800 (PST) Received: by 10.76.27.73 with HTTP; Mon, 13 Jan 2014 15:37:39 -0800 (PST) X-Originating-IP: [208.81.212.224] In-Reply-To: References: Date: Mon, 13 Jan 2014 15:37:39 -0800 Message-ID: Subject: Re: Unexpected returning false from IndexWriter.tryDeleteDocument From: Derek Lewis To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7bd6c03ebc5cf704efe28fca X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd6c03ebc5cf704efe28fca Content-Type: text/plain; charset=ISO-8859-1 Hello again, I've been doing some further investigation, and once again, I'm stumped how we could have had this problem happen in the first place. I followed up on your mention of "one should not rely on when merges might happen", which seems like good advice. :) That said... We're always calling close() on IndexWriter before "unlocking the directory", and close() calls close(true), which waits for merges to complete. Given that we only ever create an IndexWriter while we have the directory locked, and we close() before releasing the lock, I'm not sure how we could have background merges going on that weren't triggered within the thread currently holding the directory lock. The mystery is still what, on the current thread, could be triggering merges. As mentioned in the original post, up to the point that we're getting the "false" return value, the only write operations we've done is tryDeleteDocument. I think it's a mystery that's going to remain unsolved unless we see this happen again. I will be investigating turning on the infoStream. :) Thanks, Derek On Sat, Jan 4, 2014 at 2:21 AM, Michael McCandless < lucene@mikemccandless.com> wrote: > On Thu, Jan 2, 2014 at 7:53 PM, Derek Lewis wrote: > > Sorry for the delay responding. Holidays and all that. :) > > No problem. > > > The retry approach did work, our process finished in the end. At some > > point, I suppose we'll just live with the chance this might happen and > dump > > a bunch of exceptions into the log, if the effort to fix it is too high. > > Being pragmatic and all. > > Fair enough :) I do think retry is a valid approach. > > > You are correct that preventing the duplicate indexing is hard. We do > have > > things in place to try to prevent it, emphasis on the "try". > Occasionally, > > things go wrong and we get a small number of duplicates, but on at least > on > > occasion that number was anything but small. ;) > > > > I'm as sure as I can be that there were no merges running, since we're > > locking that directory before running this process. All our things that > > index use that same lock, so unless merges happen in a background thread > > within Lucene, rather than the calling thread that's adding new documents > > to the index, there should be no merges going on outside of this lock. > In > > that case, calling waitForMerges shouldn't have any effect. > > Merging does run in a background thread by default > (ConcurrentMergeScheduler), and a still-running merge could be ongoing > when you "lock that directory". > > I don't think IndexWriter kicks off merges on init today, but it's > free to (it's an impl detail). > > Net/net one should not rely on when merges might happen... > > > I know you've mentioned the infoStream a couple times :) But I don't > think > > turning it on would be a good idea, in our case. We've only had this > > problem crop up once, so there's no guarantee at all that it'll happen > > again, and the infoStream logging would be a lot of data with all the > > indexing we're doing. Unfortunately, I just don't think it's feasible. > > In fact infoStream doesn't generate THAT much data: it doesn't log for > every added doc. Only when segment changes happen (a flush, a merge, > deletes applied, etc.). And it can be very useful in post-mortem to > figure out what happened when something goes wrong. > > > Thanks very much for the suggestion about FilterIndexReader with > > addIndices. That sounds very promising. I'm going to investigate doing > > our duplicate filtering that way instead. > > > > Thanks again for the help. Cheers :) > > You're welcome! > > Mike McCandless > > http://blog.mikemccandless.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --047d7bd6c03ebc5cf704efe28fca--