lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Unexpected returning false from IndexWriter.tryDeleteDocument
Date Tue, 14 Jan 2014 10:05:32 GMT
It's also entirely possible you're hitting an unexpected case or bug,
where .tryDeleteDocument could have done the delete but failed ... who
knows :)

But infoStream is the next step.  It's trivial to turn on, e.g. just
call IndexWriterConfig.setInfoStream(new
PrintStreamInfoStream(System.out)) in the IWC that you pass to IW.

I'm curious why you're getting unexpected false back...

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 13, 2014 at 6:37 PM, Derek Lewis <derek@lewisd.com> wrote:
> Hello again,
>
> I've been doing some further investigation, and once again, I'm stumped how
> we could have had this problem happen in the first place.
>
> I followed up on your mention of "one should not rely on when merges might
> happen", which seems like good advice. :) That said...
>
> We're always calling close() on IndexWriter before "unlocking the
> directory", and close() calls close(true), which waits for merges to
> complete.  Given that we only ever create an IndexWriter while we have the
> directory locked, and we close() before releasing the lock, I'm not sure
> how we could have background merges going on that weren't triggered within
> the thread currently holding the directory lock.
>
> The mystery is still what, on the current thread, could be triggering
> merges.  As mentioned in the original post, up to the point that we're
> getting the "false" return value, the only write operations we've done is
> tryDeleteDocument.  I think it's a mystery that's going to remain unsolved
> unless we see this happen again.  I will be investigating turning on the
> infoStream. :)
>
> Thanks,
> Derek
>
>
>
> On Sat, Jan 4, 2014 at 2:21 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> On Thu, Jan 2, 2014 at 7:53 PM, Derek Lewis <derek@lewisd.com> wrote:
>> > Sorry for the delay responding.  Holidays and all that. :)
>>
>> No problem.
>>
>> > The retry approach did work, our process finished in the end.  At some
>> > point, I suppose we'll just live with the chance this might happen and
>> dump
>> > a bunch of exceptions into the log, if the effort to fix it is too high.
>> > Being pragmatic and all.
>>
>> Fair enough :)  I do think retry is a valid approach.
>>
>> > You are correct that preventing the duplicate indexing is hard.  We do
>> have
>> > things in place to try to prevent it, emphasis on the "try".
>>  Occasionally,
>> > things go wrong and we get a small number of duplicates, but on at least
>> on
>> > occasion that number was anything but small. ;)
>> >
>> > I'm as sure as I can be that there were no merges running, since we're
>> > locking that directory before running this process. All our things that
>> > index use that same lock, so unless merges happen in a background thread
>> > within Lucene, rather than the calling thread that's adding new documents
>> > to the index, there should be no merges going on outside of this lock.
>>  In
>> > that case, calling waitForMerges shouldn't have any effect.
>>
>> Merging does run in a background thread by default
>> (ConcurrentMergeScheduler), and a still-running merge could be ongoing
>> when you "lock that directory".
>>
>> I don't think IndexWriter kicks off merges on init today, but it's
>> free to (it's an impl detail).
>>
>> Net/net one should not rely on when merges might happen...
>>
>> > I know you've mentioned the infoStream a couple times :) But I don't
>> think
>> > turning it on would be a good idea, in our case.  We've only had this
>> > problem crop up once, so there's no guarantee at all that it'll happen
>> > again, and the infoStream logging would be a lot of data with all the
>> > indexing we're doing.  Unfortunately, I just don't think it's feasible.
>>
>> In fact infoStream doesn't generate THAT much data: it doesn't log for
>> every added doc.  Only when segment changes happen (a flush, a merge,
>> deletes applied, etc.).  And it can be very useful in post-mortem to
>> figure out what happened when something goes wrong.
>>
>> > Thanks very much for the suggestion about FilterIndexReader with
>> > addIndices.  That sounds very promising.  I'm going to investigate doing
>> > our duplicate filtering that way instead.
>> >
>> > Thanks again for the help.  Cheers :)
>>
>> You're welcome!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message