Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAL8PwkbQ_4gs6ATPx=1P-ZBpuckELUaUQz10-+_EsfAN1yemgQ@mail.gmail.com>
References: 
 <CAGz9in5PfoC8j_xRy+4t_q30F9kq4W0vSDABMdeAvd=ZwqnKhA@mail.gmail.com>
	<CAL8PwkZ4SvGsOuSrHbj=e-SvdJ9yMfS_mc5r0p59C3cfG5g+WA@mail.gmail.com>
	<CAGz9in7kfVrL+eY84P7Et_YboS9vcGFsKyPc6HJxAYCRGgDnaw@mail.gmail.com>
	<CAL8PwkYZDLzrWEVbpL1wB_gFT9CQrwmFOFp0k_udzXJnaVe8ug@mail.gmail.com>
	<CAGz9in7rHbFvR-0M49GCKw7_390AS5+i5Yt_DV1JJo4VRybG7w@mail.gmail.com>
	<CAL8PwkbyeAecxVSDiVpnzo+NjfTK4G0HRCKZ_eNgaymZBXhquA@mail.gmail.com>
	<CAGz9in646vki41K_xTwRXh9dLO3eQATuVHeCVbdOywy3d=JJ3A@mail.gmail.com>
	<CAL8PwkbQ_4gs6ATPx=1P-ZBpuckELUaUQz10-+_EsfAN1yemgQ@mail.gmail.com>
Date: Mon, 13 Jan 2014 15:37:39 -0800
Message-ID: 
 <CAGz9in4VS1OoGxyej3snFW2KX-eXCrs_UBV2ydoGv9J4kogwBA@mail.gmail.com>
Subject: Re: Unexpected returning false from IndexWriter.tryDeleteDocument
From: Derek Lewis <derek@lewisd.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=047d7bd6c03ebc5cf704efe28fca

--047d7bd6c03ebc5cf704efe28fca
Content-Type: text/plain; charset=ISO-8859-1

Hello again,

I've been doing some further investigation, and once again, I'm stumped how
we could have had this problem happen in the first place.

I followed up on your mention of "one should not rely on when merges might
happen", which seems like good advice. :) That said...

We're always calling close() on IndexWriter before "unlocking the
directory", and close() calls close(true), which waits for merges to
complete.  Given that we only ever create an IndexWriter while we have the
directory locked, and we close() before releasing the lock, I'm not sure
how we could have background merges going on that weren't triggered within
the thread currently holding the directory lock.

The mystery is still what, on the current thread, could be triggering
merges.  As mentioned in the original post, up to the point that we're
getting the "false" return value, the only write operations we've done is
tryDeleteDocument.  I think it's a mystery that's going to remain unsolved
unless we see this happen again.  I will be investigating turning on the
infoStream. :)

Thanks,
Derek


On Sat, Jan 4, 2014 at 2:21 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, Jan 2, 2014 at 7:53 PM, Derek Lewis <derek@lewisd.com> wrote:
> > Sorry for the delay responding.  Holidays and all that. :)
>
> No problem.
>
> > The retry approach did work, our process finished in the end.  At some
> > point, I suppose we'll just live with the chance this might happen and
> dump
> > a bunch of exceptions into the log, if the effort to fix it is too high.
> > Being pragmatic and all.
>
> Fair enough :)  I do think retry is a valid approach.
>
> > You are correct that preventing the duplicate indexing is hard.  We do
> have
> > things in place to try to prevent it, emphasis on the "try".
>  Occasionally,
> > things go wrong and we get a small number of duplicates, but on at least
> on
> > occasion that number was anything but small. ;)
> >
> > I'm as sure as I can be that there were no merges running, since we're
> > locking that directory before running this process. All our things that
> > index use that same lock, so unless merges happen in a background thread
> > within Lucene, rather than the calling thread that's adding new documents
> > to the index, there should be no merges going on outside of this lock.
>  In
> > that case, calling waitForMerges shouldn't have any effect.
>
> Merging does run in a background thread by default
> (ConcurrentMergeScheduler), and a still-running merge could be ongoing
> when you "lock that directory".
>
> I don't think IndexWriter kicks off merges on init today, but it's
> free to (it's an impl detail).
>
> Net/net one should not rely on when merges might happen...
>
> > I know you've mentioned the infoStream a couple times :) But I don't
> think
> > turning it on would be a good idea, in our case.  We've only had this
> > problem crop up once, so there's no guarantee at all that it'll happen
> > again, and the infoStream logging would be a lot of data with all the
> > indexing we're doing.  Unfortunately, I just don't think it's feasible.
>
> In fact infoStream doesn't generate THAT much data: it doesn't log for
> every added doc.  Only when segment changes happen (a flush, a merge,
> deletes applied, etc.).  And it can be very useful in post-mortem to
> figure out what happened when something goes wrong.
>
> > Thanks very much for the suggestion about FilterIndexReader with
> > addIndices.  That sounds very promising.  I'm going to investigate doing
> > our duplicate filtering that way instead.
> >
> > Thanks again for the help.  Cheers :)
>
> You're welcome!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--047d7bd6c03ebc5cf704efe28fca--