lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Tignor <ctig...@thinkmap.com>
Subject Re: IndexWriter optimize() deadlock
Date Fri, 16 Oct 2009 23:11:46 GMT
the doWait() call is synchronized on IndexWriter but it is also, as you
suggest in a loop in a block synchronized on IndexWriter.

The doWait() call returns immediately, still holding the IndexWriter lock
from the loop in the synchornized block as my stack trace shows, without
blocking and giving the merger thread a chance to merge.  It keeps repeating
this doWait procedure which unfortunately never actually waits and the
MergerThread is starved....

C>T>

On Fri, Oct 16, 2009 at 6:53 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I'm glad you worked around it!  But I don't fully understand the
> issue.  That doWait is inside a sync(writer) block... if the Future
> manages to interrupt it, then that thread will release the lock when
> it exits that sync block.
>
> Actually, if the thread was indeed interrupted, you may be hitting this:
>
>    http://issues.apache.org/jira/browse/LUCENE-1573
>
> (which is fixed in 2.9).
>
> Mike
>
> On Fri, Oct 16, 2009 at 6:01 PM, Christopher Tignor
> <ctignor@thinkmap.com> wrote:
> > I discovered the problem and fixed it's effect on my code:
> >
> > Using the source for Lucene version 2.4.1,
> > in IndexWriter.optimize() there is a call to doWait() on line 2283
> > this method attempts to wait for a second in order to give other threads
> it
> > has spawned a chance to acquire it's mutex and complete index merges.
> > It checks if there are any merges left after it wakes up.  if there
> aren't
> > int proceeds.
> >
> > However, using using a Future object associated with this thread and
> calling
> > cancel(true) doesn't allow the Thread to enter into Object.wait() (or the
> > executorservice observing this Future immediately wakes it up or
> something)
> > so it returns immediately and the thread holding the lock on the
> IndexWriter
> > never sceeds it's lock to threads running a merge.
> >
> > I re-wrote my code such that it didn't need to call the interruptible
> > version of Future.cancel() to solve it, i.e. Future.cancel(false)
> >
> > thanks,
> >
> > C>T>
> >
> > On Fri, Oct 16, 2009 at 4:44 PM, Christopher Tignor <
> ctignor@thinkmap.com>wrote:
> >
> >> Indeed it looks like the thread running MergerThread started (After
> passing
> >> off to ConcurentMergeScheduler) by the thread calling
> IndexWriter.optimize()
> >> is indeed waiting on the mutex for the IndexWriter to be free so it can
> use
> >> the object to call mergeInit().
> >>
> >> The IndexWriter however has entered a synchronized() waiting loop,
> waking
> >> up every second ( in doWait() ) and checking if there are running merges
> >> left - which of course there are as the thread responsible for doing the
> >> merging can't get in.  Deadlocked stack traces are below:
> >>
> >> Thread [pool-2-thread-5] (Suspended)
> >>     owns: IndexWriter  (id=71)
> >>     owns: LuceneResultsPersister  (id=85)
> >>     IndexWriter.optimize(int, boolean) line: 2283
> >>     IndexWriter.optimize(boolean) line: 2218
> >>     IndexWriter.optimize() line: 2198
> >>     LuceneResultsPersister.commit() line: 97
> >>     PersistenceJobQueue.persistAndCommitBatch(PersistenceJobConsumer)
> line:
> >> 105
> >>     PersistenceJobConsumer.consume() line: 46
> >>     PersistenceJobConsumer.run() line: 67
> >>     Executors$RunnableAdapter<T>.call() line: 441
> >>     FutureTask$Sync.innerRun() line: 303
> >>     FutureTask<V>.run() line: 138
> >>     ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
> >>     ThreadPoolExecutor$Worker.run() line: 908
> >>     Thread.run() line: 619
> >>
> >> Daemon Thread [Lucene Merge Thread #0] (Suspended)
> >>     waiting for: IndexWriter  (id=71)
> >>     IndexWriter.mergeInit(MergePolicy$OneMerge) line: 3971
> >>     IndexWriter.merge(MergePolicy$OneMerge) line: 3879
> >>     ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 205
> >>     ConcurrentMergeScheduler$MergeThread.run() line: 260
> >>
> >> I don't really understand how this code is supposed to (and has) worked
> >> before this and what the problem thus might be here:
> >>
> >> In IndexWriter.optimize() at line 2263 we have the synchronized block
> where
> >> doWait becomes true using the parameter-less call to optimize():
> >>
> >> if (doWait) {
> >>       synchronized(this) {
> >>         while(true) {
> >>           if (mergeExceptions.size() > 0) {
> >>             // Forward any exceptions in background merge
> >>             // threads to the current thread:
> >>             final int size = mergeExceptions.size();
> >>             for(int i=0;i<size;i++) {
> >>               final MergePolicy.OneMerge merge = (MergePolicy.OneMerge)
> >> mergeExceptions.get(0);
> >>               if (merge.optimize) {
> >>                 IOException err = new IOException("background merge hit
> >> exception: " + merge.segString(directory));
> >>                 final Throwable t = merge.getException();
> >>                 if (t != null)
> >>                   err.initCause(t);
> >>                 throw err;
> >>               }
> >>             }
> >>           }
> >>
> >>           if (optimizeMergesPending())
> >>             doWait();
> >>           else
> >>             break;
> >>         }
> >>       }
> >>
> >> It is holding this lock while the thread it started to do the mergin is
> >> trying to call it's mergeInit() method.
> >>
> >> Any thoughts?
> >>
> >> thanks,
> >>
> >> C>T>
> >>
> >>
> >> On Fri, Oct 16, 2009 at 4:11 PM, Christopher Tignor <
> ctignor@thinkmap.com>wrote:
> >>
> >>> After tracing through the lucene source more it seems that what is
> >>> happening is after I call Future.cancel(true) on my parent thread,
> >>> optimize() is called and this method launches it's own thread using a
> >>> ConcurrentMergeScheduler$MergeThread to do the actual merging.
> >>>
> >>> When this Thread comes around to calling mergeInit() on my index writer
> -
> >>> a synchronized method - it hangs.  For some reason it seems to no long
> have
> >>> the mutex perhaps?  Trace of this thread's stall below...
> >>>
> >>> Daemon Thread [Lucene Merge Thread #0] (Suspended)
> >>>     IndexWriter.mergeInit(MergePolicy$OneMerge) line: 3971
> >>>     IndexWriter.merge(MergePolicy$OneMerge) line: 3879
> >>>     ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 205
> >>>     ConcurrentMergeScheduler$MergeThread.run() line: 260
> >>>
> >>> thanks again,
> >>>
> >>> C>T>
> >>>
> >>>
> >>>
> >>> On Fri, Oct 16, 2009 at 3:53 PM, Michael McCandless <
> >>> lucene@mikemccandless.com> wrote:
> >>>
> >>>> But if the Future.cancel call turns out to be a no-op (simply waits
> >>>> instead of interrupting the thread), how could it be that the deadlock
> >>>> only happens when you call it?  Weird.  Are you really sure it's not
> >>>> actually calling Thread.interrupt?
> >>>>
> >>>> That stack trace looks like a normal "optimize is waiting for the
> >>>> background merges to complete".  Is it possible your background merges
> >>>> are hitting exceptions?  You should see them on your error console if
> >>>> so...
> >>>>
> >>>> Mike
> >>>>
> >>>> On Fri, Oct 16, 2009 at 3:17 PM, Christopher Tignor
> >>>> <ctignor@thinkmap.com> wrote:
> >>>> > It doesn't look like my Future.cancel(true) is actually interrupting
> >>>> the
> >>>> > thread.  It only does so "if necessary" and in this case seems
to be
> >>>> letting
> >>>> > the Thread finish gracefully without need for interruption.
> >>>> >
> >>>> > The stack trace leading up to the hanging IndexWriter.optimize()
> method
> >>>> is
> >>>> > below, though not terribly useful I imagine.  Here only 5 documents
> are
> >>>> > trying to be added to an index taking up only 39.2 MB on disk.
>  Again
> >>>> this
> >>>> > deadlocking only happens after I use the Future for this task to
> cancel
> >>>> the
> >>>> > task...
> >>>> >
> >>>> > Thread [pool-3-thread-5] (Suspended)
> >>>> >    IndexWriter.doWait() line: 4494
> >>>> >    IndexWriter.optimize(int, boolean) line: 2283
> >>>> >    IndexWriter.optimize(boolean) line: 2218
> >>>> >    IndexWriter.optimize() line: 2198
> >>>> >    LuceneResultsPersister.commit() line: 97
> >>>> >    PersistenceJobQueue.persistAndCommitBatch(PersistenceJobConsumer)
> >>>> line:
> >>>> > 105
> >>>> >    PersistenceJobConsumer.consume() line: 46
> >>>> >    PersistenceJobConsumer.run() line: 67
> >>>> >    Executors$RunnableAdapter<T>.call() line: 441
> >>>> >    FutureTask$Sync.innerRun() line: 303
> >>>> >    FutureTask<V>.run() line: 138
> >>>> >    ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
> >>>> >    ThreadPoolExecutor$Worker.run() line: 908
> >>>> >    Thread.run() line: 619
> >>>> >
> >>>> > thanks,
> >>>> >
> >>>> > C>T>
> >>>> >
> >>>> > On Fri, Oct 16, 2009 at 1:58 PM, Michael McCandless <
> >>>> > lucene@mikemccandless.com> wrote:
> >>>> >
> >>>> >> My guess is it's the invocation of Thread.interrupt (which
> >>>> >> Future.cancel(true) calls if the task is running) that lead
to the
> >>>> >> deadlock.
> >>>> >>
> >>>> >> Is it possible to get the stack trace of the thrown exception
when
> the
> >>>> >> thread was interrupted?  Maybe indeed something in IW isn't
> cleaning
> >>>> >> up its state on being interrupted.
> >>>> >>
> >>>> >> Mike
> >>>> >>
> >>>> >> On Fri, Oct 16, 2009 at 1:43 PM, Christopher Tignor
> >>>> >> <ctignor@thinkmap.com> wrote:
> >>>> >> > thanks for getting back.
> >>>> >> >
> >>>> >> > I do not lock on the IndexWriter object itself but all
methods in
> my
> >>>> >> > consumer class that use IndexWriter are synchronized (locking
my
> >>>> >> singleton
> >>>> >> > consumer object itself).
> >>>> >> > The thread is waiting at IndexWriter.doWait().  What might
cuase
> >>>> this?
> >>>> >> >
> >>>> >> > thanks -
> >>>> >> >
> >>>> >> > C>T>
> >>>> >> >
> >>>> >> > On Fri, Oct 16, 2009 at 12:58 PM, Uwe Schindler <uwe@thetaphi.de
> >
> >>>> wrote:
> >>>> >> >
> >>>> >> >> Do you use the IndexWriter as mutex in a synchronized()
block?
> This
> >>>> is
> >>>> >> not
> >>>> >> >> supported and may hang. Never lock on IndexWriter
instances.
> >>>> IndexWriter
> >>>> >> >> itself is thread safe.
> >>>> >> >>
> >>>> >> >> -----
> >>>> >> >> Uwe Schindler
> >>>> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>> >> >> http://www.thetaphi.de
> >>>> >> >> eMail: uwe@thetaphi.de
> >>>> >> >>
> >>>> >> >> > -----Original Message-----
> >>>> >> >> > From: Christopher Tignor [mailto:ctignor@thinkmap.com]
> >>>> >> >> > Sent: Friday, October 16, 2009 6:50 PM
> >>>> >> >> > To: java-user
> >>>> >> >> > Subject: IndexWriter optimize() deadlock
> >>>> >> >> >
> >>>> >> >> > Hello,
> >>>> >> >> >
> >>>> >> >> > I am trying to track down the cause of my code
hanging on
> calling
> >>>> >> >> > IndexWriter.optimize() at its doWait() method.
> >>>> >> >> > It appears, thus that it is watiing on other
merges to happen
> >>>> which is
> >>>> >> a
> >>>> >> >> > bit
> >>>> >> >> > confusing to me:
> >>>> >> >> >
> >>>> >> >> > My application is a simple producer consumer
model where
> >>>> documents are
> >>>> >> >> > added
> >>>> >> >> > to a queue by producers and then one consumer
with one
> >>>> indexwriter
> >>>> >> (the
> >>>> >> >> > only
> >>>> >> >> > in the application) periodically calls addDocument()
on a
> batch
> >>>> of
> >>>> >> these
> >>>> >> >> > jobs and then calls optimize(), commit(). and
then close().
> >>>>  There is
> >>>> >> >> only
> >>>> >> >> > one thread running the consumer so I am confused
as to how the
> >>>> >> >> indexwriter
> >>>> >> >> > might be deadlocking itself.  Indeed this is
the only thread
> >>>> active
> >>>> >> when
> >>>> >> >> > the
> >>>> >> >> > deadlock occurs so it seems to be a problem of
reentry.
> >>>> >> >> >
> >>>> >> >> > Importantly, the deadlocking occurs only when
the thread is
> >>>> trying to
> >>>> >> >> > shutdown - that is the Thread running this lucene
consumer has
> a
> >>>> >> Future
> >>>> >> >> > that
> >>>> >> >> > has had its cancel(true) interrupting method
called.  Is it
> >>>> possible
> >>>> >> that
> >>>> >> >> > an
> >>>> >> >> > internal Lucene lock is obtained during addDocument()
and on
> >>>> >> interruption
> >>>> >> >> > is
> >>>> >> >> > never released so the subsequent optimize() call
hangs?  This
> >>>> doesn't
> >>>> >> >> > appear
> >>>> >> >> > to be happening...
> >>>> >> >> >
> >>>> >> >> > Any help appreciated.
> >>>> >> >> >
> >>>> >> >> > thanks,
> >>>> >> >> >
> >>>> >> >> > C>T>
> >>>> >> >> >
> >>>> >> >> > what might I be missing here?
> >>>> >> >> >
> >>>> >> >> > --
> >>>> >> >> > TH!NKMAP
> >>>> >> >> >
> >>>> >> >> > Christopher Tignor | Senior Software Architect
> >>>> >> >> > 155 Spring Street NY, NY 10012
> >>>> >> >> > p.212-285-8600 x385 f.212-285-8999
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> ---------------------------------------------------------------------
> >>>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> >> >> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >>>> >> >>
> >>>> >> >>
> >>>> >> >
> >>>> >> >
> >>>> >> > --
> >>>> >> > TH!NKMAP
> >>>> >> >
> >>>> >> > Christopher Tignor | Senior Software Architect
> >>>> >> > 155 Spring Street NY, NY 10012
> >>>> >> > p.212-285-8600 x385 f.212-285-8999
> >>>> >> >
> >>>> >>
> >>>> >>
> ---------------------------------------------------------------------
> >>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>> >>
> >>>> >>
> >>>> >
> >>>> >
> >>>> > --
> >>>> > TH!NKMAP
> >>>> >
> >>>> > Christopher Tignor | Senior Software Architect
> >>>> > 155 Spring Street NY, NY 10012
> >>>> > p.212-285-8600 x385 f.212-285-8999
> >>>> >
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> TH!NKMAP
> >>>
> >>> Christopher Tignor | Senior Software Architect
> >>> 155 Spring Street NY, NY 10012
> >>> p.212-285-8600 x385 f.212-285-8999
> >>>
> >>
> >>
> >>
> >> --
> >> TH!NKMAP
> >>
> >> Christopher Tignor | Senior Software Architect
> >> 155 Spring Street NY, NY 10012
> >> p.212-285-8600 x385 f.212-285-8999
> >>
> >
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message