lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexWriter optimize() deadlock
Date Sat, 17 Oct 2009 09:02:10 GMT
OK then this is exactly what was happening with LUCENE-1573 -- Lucene
was not properly handling the interrupt request, and instead of
throwing an exception, entered a spin loop that starves the merge
threads.

Can you try 2.9.0 (and turn back on your Future.cancel(true) call) and
see if it's fixed?

Mike

On Fri, Oct 16, 2009 at 7:11 PM, Christopher Tignor
<ctignor@thinkmap.com> wrote:
> the doWait() call is synchronized on IndexWriter but it is also, as you
> suggest in a loop in a block synchronized on IndexWriter.
>
> The doWait() call returns immediately, still holding the IndexWriter lock
> from the loop in the synchornized block as my stack trace shows, without
> blocking and giving the merger thread a chance to merge.  It keeps repeating
> this doWait procedure which unfortunately never actually waits and the
> MergerThread is starved....
>
> C>T>
>
> On Fri, Oct 16, 2009 at 6:53 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I'm glad you worked around it!  But I don't fully understand the
>> issue.  That doWait is inside a sync(writer) block... if the Future
>> manages to interrupt it, then that thread will release the lock when
>> it exits that sync block.
>>
>> Actually, if the thread was indeed interrupted, you may be hitting this:
>>
>>    http://issues.apache.org/jira/browse/LUCENE-1573
>>
>> (which is fixed in 2.9).
>>
>> Mike
>>
>> On Fri, Oct 16, 2009 at 6:01 PM, Christopher Tignor
>> <ctignor@thinkmap.com> wrote:
>> > I discovered the problem and fixed it's effect on my code:
>> >
>> > Using the source for Lucene version 2.4.1,
>> > in IndexWriter.optimize() there is a call to doWait() on line 2283
>> > this method attempts to wait for a second in order to give other threads
>> it
>> > has spawned a chance to acquire it's mutex and complete index merges.
>> > It checks if there are any merges left after it wakes up.  if there
>> aren't
>> > int proceeds.
>> >
>> > However, using using a Future object associated with this thread and
>> calling
>> > cancel(true) doesn't allow the Thread to enter into Object.wait() (or the
>> > executorservice observing this Future immediately wakes it up or
>> something)
>> > so it returns immediately and the thread holding the lock on the
>> IndexWriter
>> > never sceeds it's lock to threads running a merge.
>> >
>> > I re-wrote my code such that it didn't need to call the interruptible
>> > version of Future.cancel() to solve it, i.e. Future.cancel(false)
>> >
>> > thanks,
>> >
>> > C>T>
>> >
>> > On Fri, Oct 16, 2009 at 4:44 PM, Christopher Tignor <
>> ctignor@thinkmap.com>wrote:
>> >
>> >> Indeed it looks like the thread running MergerThread started (After
>> passing
>> >> off to ConcurentMergeScheduler) by the thread calling
>> IndexWriter.optimize()
>> >> is indeed waiting on the mutex for the IndexWriter to be free so it can
>> use
>> >> the object to call mergeInit().
>> >>
>> >> The IndexWriter however has entered a synchronized() waiting loop,
>> waking
>> >> up every second ( in doWait() ) and checking if there are running merges
>> >> left - which of course there are as the thread responsible for doing the
>> >> merging can't get in.  Deadlocked stack traces are below:
>> >>
>> >> Thread [pool-2-thread-5] (Suspended)
>> >>     owns: IndexWriter  (id=71)
>> >>     owns: LuceneResultsPersister  (id=85)
>> >>     IndexWriter.optimize(int, boolean) line: 2283
>> >>     IndexWriter.optimize(boolean) line: 2218
>> >>     IndexWriter.optimize() line: 2198
>> >>     LuceneResultsPersister.commit() line: 97
>> >>     PersistenceJobQueue.persistAndCommitBatch(PersistenceJobConsumer)
>> line:
>> >> 105
>> >>     PersistenceJobConsumer.consume() line: 46
>> >>     PersistenceJobConsumer.run() line: 67
>> >>     Executors$RunnableAdapter<T>.call() line: 441
>> >>     FutureTask$Sync.innerRun() line: 303
>> >>     FutureTask<V>.run() line: 138
>> >>     ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
>> >>     ThreadPoolExecutor$Worker.run() line: 908
>> >>     Thread.run() line: 619
>> >>
>> >> Daemon Thread [Lucene Merge Thread #0] (Suspended)
>> >>     waiting for: IndexWriter  (id=71)
>> >>     IndexWriter.mergeInit(MergePolicy$OneMerge) line: 3971
>> >>     IndexWriter.merge(MergePolicy$OneMerge) line: 3879
>> >>     ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 205
>> >>     ConcurrentMergeScheduler$MergeThread.run() line: 260
>> >>
>> >> I don't really understand how this code is supposed to (and has) worked
>> >> before this and what the problem thus might be here:
>> >>
>> >> In IndexWriter.optimize() at line 2263 we have the synchronized block
>> where
>> >> doWait becomes true using the parameter-less call to optimize():
>> >>
>> >> if (doWait) {
>> >>       synchronized(this) {
>> >>         while(true) {
>> >>           if (mergeExceptions.size() > 0) {
>> >>             // Forward any exceptions in background merge
>> >>             // threads to the current thread:
>> >>             final int size = mergeExceptions.size();
>> >>             for(int i=0;i<size;i++) {
>> >>               final MergePolicy.OneMerge merge = (MergePolicy.OneMerge)
>> >> mergeExceptions.get(0);
>> >>               if (merge.optimize) {
>> >>                 IOException err = new IOException("background merge
hit
>> >> exception: " + merge.segString(directory));
>> >>                 final Throwable t = merge.getException();
>> >>                 if (t != null)
>> >>                   err.initCause(t);
>> >>                 throw err;
>> >>               }
>> >>             }
>> >>           }
>> >>
>> >>           if (optimizeMergesPending())
>> >>             doWait();
>> >>           else
>> >>             break;
>> >>         }
>> >>       }
>> >>
>> >> It is holding this lock while the thread it started to do the mergin is
>> >> trying to call it's mergeInit() method.
>> >>
>> >> Any thoughts?
>> >>
>> >> thanks,
>> >>
>> >> C>T>
>> >>
>> >>
>> >> On Fri, Oct 16, 2009 at 4:11 PM, Christopher Tignor <
>> ctignor@thinkmap.com>wrote:
>> >>
>> >>> After tracing through the lucene source more it seems that what is
>> >>> happening is after I call Future.cancel(true) on my parent thread,
>> >>> optimize() is called and this method launches it's own thread using
a
>> >>> ConcurrentMergeScheduler$MergeThread to do the actual merging.
>> >>>
>> >>> When this Thread comes around to calling mergeInit() on my index writer
>> -
>> >>> a synchronized method - it hangs.  For some reason it seems to no long
>> have
>> >>> the mutex perhaps?  Trace of this thread's stall below...
>> >>>
>> >>> Daemon Thread [Lucene Merge Thread #0] (Suspended)
>> >>>     IndexWriter.mergeInit(MergePolicy$OneMerge) line: 3971
>> >>>     IndexWriter.merge(MergePolicy$OneMerge) line: 3879
>> >>>     ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 205
>> >>>     ConcurrentMergeScheduler$MergeThread.run() line: 260
>> >>>
>> >>> thanks again,
>> >>>
>> >>> C>T>
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Oct 16, 2009 at 3:53 PM, Michael McCandless <
>> >>> lucene@mikemccandless.com> wrote:
>> >>>
>> >>>> But if the Future.cancel call turns out to be a no-op (simply waits
>> >>>> instead of interrupting the thread), how could it be that the deadlock
>> >>>> only happens when you call it?  Weird.  Are you really sure it's
not
>> >>>> actually calling Thread.interrupt?
>> >>>>
>> >>>> That stack trace looks like a normal "optimize is waiting for the
>> >>>> background merges to complete".  Is it possible your background
merges
>> >>>> are hitting exceptions?  You should see them on your error console
if
>> >>>> so...
>> >>>>
>> >>>> Mike
>> >>>>
>> >>>> On Fri, Oct 16, 2009 at 3:17 PM, Christopher Tignor
>> >>>> <ctignor@thinkmap.com> wrote:
>> >>>> > It doesn't look like my Future.cancel(true) is actually interrupting
>> >>>> the
>> >>>> > thread.  It only does so "if necessary" and in this case seems
to be
>> >>>> letting
>> >>>> > the Thread finish gracefully without need for interruption.
>> >>>> >
>> >>>> > The stack trace leading up to the hanging IndexWriter.optimize()
>> method
>> >>>> is
>> >>>> > below, though not terribly useful I imagine.  Here only 5
documents
>> are
>> >>>> > trying to be added to an index taking up only 39.2 MB on disk.
>>  Again
>> >>>> this
>> >>>> > deadlocking only happens after I use the Future for this task
to
>> cancel
>> >>>> the
>> >>>> > task...
>> >>>> >
>> >>>> > Thread [pool-3-thread-5] (Suspended)
>> >>>> >    IndexWriter.doWait() line: 4494
>> >>>> >    IndexWriter.optimize(int, boolean) line: 2283
>> >>>> >    IndexWriter.optimize(boolean) line: 2218
>> >>>> >    IndexWriter.optimize() line: 2198
>> >>>> >    LuceneResultsPersister.commit() line: 97
>> >>>> >    PersistenceJobQueue.persistAndCommitBatch(PersistenceJobConsumer)
>> >>>> line:
>> >>>> > 105
>> >>>> >    PersistenceJobConsumer.consume() line: 46
>> >>>> >    PersistenceJobConsumer.run() line: 67
>> >>>> >    Executors$RunnableAdapter<T>.call() line: 441
>> >>>> >    FutureTask$Sync.innerRun() line: 303
>> >>>> >    FutureTask<V>.run() line: 138
>> >>>> >    ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
>> >>>> >    ThreadPoolExecutor$Worker.run() line: 908
>> >>>> >    Thread.run() line: 619
>> >>>> >
>> >>>> > thanks,
>> >>>> >
>> >>>> > C>T>
>> >>>> >
>> >>>> > On Fri, Oct 16, 2009 at 1:58 PM, Michael McCandless <
>> >>>> > lucene@mikemccandless.com> wrote:
>> >>>> >
>> >>>> >> My guess is it's the invocation of Thread.interrupt (which
>> >>>> >> Future.cancel(true) calls if the task is running) that
lead to the
>> >>>> >> deadlock.
>> >>>> >>
>> >>>> >> Is it possible to get the stack trace of the thrown exception
when
>> the
>> >>>> >> thread was interrupted?  Maybe indeed something in IW
isn't
>> cleaning
>> >>>> >> up its state on being interrupted.
>> >>>> >>
>> >>>> >> Mike
>> >>>> >>
>> >>>> >> On Fri, Oct 16, 2009 at 1:43 PM, Christopher Tignor
>> >>>> >> <ctignor@thinkmap.com> wrote:
>> >>>> >> > thanks for getting back.
>> >>>> >> >
>> >>>> >> > I do not lock on the IndexWriter object itself but
all methods in
>> my
>> >>>> >> > consumer class that use IndexWriter are synchronized
(locking my
>> >>>> >> singleton
>> >>>> >> > consumer object itself).
>> >>>> >> > The thread is waiting at IndexWriter.doWait().  What
might cuase
>> >>>> this?
>> >>>> >> >
>> >>>> >> > thanks -
>> >>>> >> >
>> >>>> >> > C>T>
>> >>>> >> >
>> >>>> >> > On Fri, Oct 16, 2009 at 12:58 PM, Uwe Schindler <uwe@thetaphi.de
>> >
>> >>>> wrote:
>> >>>> >> >
>> >>>> >> >> Do you use the IndexWriter as mutex in a synchronized()
block?
>> This
>> >>>> is
>> >>>> >> not
>> >>>> >> >> supported and may hang. Never lock on IndexWriter
instances.
>> >>>> IndexWriter
>> >>>> >> >> itself is thread safe.
>> >>>> >> >>
>> >>>> >> >> -----
>> >>>> >> >> Uwe Schindler
>> >>>> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >>>> >> >> http://www.thetaphi.de
>> >>>> >> >> eMail: uwe@thetaphi.de
>> >>>> >> >>
>> >>>> >> >> > -----Original Message-----
>> >>>> >> >> > From: Christopher Tignor [mailto:ctignor@thinkmap.com]
>> >>>> >> >> > Sent: Friday, October 16, 2009 6:50 PM
>> >>>> >> >> > To: java-user
>> >>>> >> >> > Subject: IndexWriter optimize() deadlock
>> >>>> >> >> >
>> >>>> >> >> > Hello,
>> >>>> >> >> >
>> >>>> >> >> > I am trying to track down the cause of my
code hanging on
>> calling
>> >>>> >> >> > IndexWriter.optimize() at its doWait() method.
>> >>>> >> >> > It appears, thus that it is watiing on other
merges to happen
>> >>>> which is
>> >>>> >> a
>> >>>> >> >> > bit
>> >>>> >> >> > confusing to me:
>> >>>> >> >> >
>> >>>> >> >> > My application is a simple producer consumer
model where
>> >>>> documents are
>> >>>> >> >> > added
>> >>>> >> >> > to a queue by producers and then one consumer
with one
>> >>>> indexwriter
>> >>>> >> (the
>> >>>> >> >> > only
>> >>>> >> >> > in the application) periodically calls addDocument()
on a
>> batch
>> >>>> of
>> >>>> >> these
>> >>>> >> >> > jobs and then calls optimize(), commit().
and then close().
>> >>>>  There is
>> >>>> >> >> only
>> >>>> >> >> > one thread running the consumer so I am confused
as to how the
>> >>>> >> >> indexwriter
>> >>>> >> >> > might be deadlocking itself.  Indeed this
is the only thread
>> >>>> active
>> >>>> >> when
>> >>>> >> >> > the
>> >>>> >> >> > deadlock occurs so it seems to be a problem
of reentry.
>> >>>> >> >> >
>> >>>> >> >> > Importantly, the deadlocking occurs only
when the thread is
>> >>>> trying to
>> >>>> >> >> > shutdown - that is the Thread running this
lucene consumer has
>> a
>> >>>> >> Future
>> >>>> >> >> > that
>> >>>> >> >> > has had its cancel(true) interrupting method
called.  Is it
>> >>>> possible
>> >>>> >> that
>> >>>> >> >> > an
>> >>>> >> >> > internal Lucene lock is obtained during addDocument()
and on
>> >>>> >> interruption
>> >>>> >> >> > is
>> >>>> >> >> > never released so the subsequent optimize()
call hangs?  This
>> >>>> doesn't
>> >>>> >> >> > appear
>> >>>> >> >> > to be happening...
>> >>>> >> >> >
>> >>>> >> >> > Any help appreciated.
>> >>>> >> >> >
>> >>>> >> >> > thanks,
>> >>>> >> >> >
>> >>>> >> >> > C>T>
>> >>>> >> >> >
>> >>>> >> >> > what might I be missing here?
>> >>>> >> >> >
>> >>>> >> >> > --
>> >>>> >> >> > TH!NKMAP
>> >>>> >> >> >
>> >>>> >> >> > Christopher Tignor | Senior Software Architect
>> >>>> >> >> > 155 Spring Street NY, NY 10012
>> >>>> >> >> > p.212-285-8600 x385 f.212-285-8999
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> ---------------------------------------------------------------------
>> >>>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> >> >> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > --
>> >>>> >> > TH!NKMAP
>> >>>> >> >
>> >>>> >> > Christopher Tignor | Senior Software Architect
>> >>>> >> > 155 Spring Street NY, NY 10012
>> >>>> >> > p.212-285-8600 x385 f.212-285-8999
>> >>>> >> >
>> >>>> >>
>> >>>> >>
>> ---------------------------------------------------------------------
>> >>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > TH!NKMAP
>> >>>> >
>> >>>> > Christopher Tignor | Senior Software Architect
>> >>>> > 155 Spring Street NY, NY 10012
>> >>>> > p.212-285-8600 x385 f.212-285-8999
>> >>>> >
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> TH!NKMAP
>> >>>
>> >>> Christopher Tignor | Senior Software Architect
>> >>> 155 Spring Street NY, NY 10012
>> >>> p.212-285-8600 x385 f.212-285-8999
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> TH!NKMAP
>> >>
>> >> Christopher Tignor | Senior Software Architect
>> >> 155 Spring Street NY, NY 10012
>> >> p.212-285-8600 x385 f.212-285-8999
>> >>
>> >
>> >
>> >
>> > --
>> > TH!NKMAP
>> >
>> > Christopher Tignor | Senior Software Architect
>> > 155 Spring Street NY, NY 10012
>> > p.212-285-8600 x385 f.212-285-8999
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message