Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BEA37DF28 for ; Sat, 2 Feb 2013 02:03:12 +0000 (UTC) Received: (qmail 92601 invoked by uid 500); 2 Feb 2013 02:03:11 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 92555 invoked by uid 500); 2 Feb 2013 02:03:11 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 92541 invoked by uid 99); 2 Feb 2013 02:03:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Feb 2013 02:03:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.216.53 as permitted sender) Received: from [209.85.216.53] (HELO mail-qa0-f53.google.com) (209.85.216.53) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Feb 2013 02:03:02 +0000 Received: by mail-qa0-f53.google.com with SMTP id z4so605571qan.12 for ; Fri, 01 Feb 2013 18:02:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=AkE682rSCwEB115l3OSQAfnwQM0uTRZuZjvCyFMEsvs=; b=0lz36zQCZmYWuMDAGi7X0p4NozVMrv/tbCwfArKzPkyz1ckds+PAt96/9ccV2wQ+fa y8iCmbGNI0y7YzQUfSv9Q4E26hDDhI5Vf18h8XH5An72Zc7DLHBAPJmXRPCXZb3Hmj/4 Z1W/m7Eh0PFLh6D5XMk9n54mrGh3wDhdLUr9tVBrNF7VtsUGbKPPp2VY/raUeNzO4lwa yPFxK3MhZQtlhkpykXJc/tEqQo+CtII0akuoPM6VaxJSAhHxkB2o+PUUzgDV/czSOVAc QpT/6MR76jQICG/kiOHjJpIluXoeZb7CKbESsOig10bKA6L74mAOsCgPN7m47UVWMyuL bRfg== MIME-Version: 1.0 X-Received: by 10.49.62.42 with SMTP id v10mr17644204qer.2.1359770561282; Fri, 01 Feb 2013 18:02:41 -0800 (PST) Received: by 10.49.12.240 with HTTP; Fri, 1 Feb 2013 18:02:41 -0800 (PST) In-Reply-To: <524BEB59-DF9A-4802-9786-603B85F52140@gmail.com> References: <7B66806C-C64A-4D00-BB7C-ECEFAF16207E@gmail.com> <5AF2F52D-B96C-4B2E-963D-BC96195EAE2D@gmail.com> <524BEB59-DF9A-4802-9786-603B85F52140@gmail.com> Date: Fri, 1 Feb 2013 21:02:41 -0500 Message-ID: Subject: Re: Stress test deadlocks From: Erick Erickson To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7bacb8f048007904d4b441d5 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bacb8f048007904d4b441d5 Content-Type: text/plain; charset=ISO-8859-1 First, about the thread interrupt exceptions; what do you think about not logging them? I could argue that if they are benign, reporting them adds unnecessary stress. I kinda figured they were harmless but thought it might be worth mentioning. Second, I re-worked the stress test program to use the old-style solr.xml file then re-ran the tests from trunk w/o any of the changes for SOLR-4196. I worked for a couple of hours then I had to stop, but tonight it ran for just a few minutes (I updated the code this morning) and got the same error (stack below just in case I'm imagining things). Next step I guess is I'll apply the changes you indicated above to trunk and see if it I can make it happen again. That said, it's a bit of apples-to-oranges but worth doing nonetheless... It's still clearly happening from some relatively new code related to the transient core thing given the trace is coming form removeEldestEntry eventually.... Found one Java-level deadlock: ============================= "commitScheduler-15616-thread-1": waiting to lock monitor 7f920387fd58 (object 7879df120, a org.apache.solr.update.DefaultSolrCoreState), which is held by "qtp1490642445-15" "qtp1490642445-15": waiting to lock monitor 7f9204803bc0 (object 786d3df78, a java.lang.Object), which is held by "commitScheduler-15616-thread-1" Java stack information for the threads listed above: =================================================== "commitScheduler-15616-thread-1": at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78) - waiting to lock <7879df120> (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:550) - locked <786d3df78> (a java.lang.Object) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) "qtp1490642445-15": at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:705) - waiting to lock <786d3df78> (a java.lang.Object) at org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64) - locked <7879df120> (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272) - locked <7879df120> (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888) - locked <7879df120> (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.core.SolrCore.close(SolrCore.java:980) at org.apache.solr.core.CoreContainer$2.removeEldestEntry(CoreContainer.java:385) at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:410) at java.util.HashMap.put(HashMap.java:385) at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:864) - locked <785614df8> (a org.apache.solr.core.CoreContainer$2) at org.apache.solr.core.CoreContainer.registerLazyCore(CoreContainer.java:829) at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1321) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:680) Found 1 deadlock. On Wed, Jan 30, 2013 at 11:06 AM, Mark Miller wrote: > > Without evidence of further damage, that is fine - it just means a thread > was interrupted. > > - Mark > > On Jan 30, 2013, at 10:52 AM, Erick Erickson > wrote: > > > FWIW, by the way, I'm getting some exceptions in the solr log, here are > the two patterns I see on a quick look (this is with my hacks, but the test > of your patch also produced some like this, I think they were the same). > Your comment about trying this with the old-style XML is getting more > compelling.... > > > > But do note that the stress tests still ran to completion OK, which > means that all documents sent to the server made it into the indexes.... > > > > Jan 30, 2013 10:20:27 AM org.apache.solr.common.SolrException log > > SEVERE: java.lang.InterruptedException > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279) > > at > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) > > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:597) > > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:680) > > > > > > FWIW > > > > > > On Wed, Jan 30, 2013 at 8:46 AM, Erick Erickson > wrote: > > No joy. It ran to completion on one of my machines for an hour, but not > the other, stack traces below. > > > > About running without the patch. The other thing that's different is > I've changed where the core.close happens in the patch as opposed to the > old code, which may confuse things. OTOH, it'd be interesting to try since > my changes to where I called close were based on a faulty assumption about > where the lock was occurring. I did a quick hack that I have yet to test > making the stress tester bang up old-style solr.xml setups, but haven't run > it yet. I can give it a try against an unmodified trunk if you think that > would generate useful information but I sadly fear it's an apples/oranges > comparison. > > > > I'm on IM or we can voice chat if you want to strategize, but I won't be > able to devote much time to this until tonight or tomorrow. I can apply > patches and run tests all day though.... > > > > Although before digging to deeply, I had to cobble the patch into > DefaultSolrCoreState and I might have screwed it up. Is the finally block > containing if (yielded) in the right place? > > > > Here's my patched code, but maybe you could just send me the whole file? > I haven't changed it outside this patch.... > > > > synchronized (writerPauseLock) { > > // we need to wait for the Writer to fall out of use > > // first lets stop it from being lent out > > pauseWriter = true; > > // then lets wait until its out of use > > log.info("Waiting until IndexWriter is unused... core=" + > coreName); > > boolean yielded = false; > > try { > > while (!writerFree) { > > // yield the commit lock > > core.getUpdateHandler().yieldCommitLock(); > > yielded = true; > > try { > > writerPauseLock.wait(100); > > } catch (InterruptedException e) {} > > if (closed) { > > throw new RuntimeException("SolrCoreState already closed"); > > } > > } > > } finally { > > if (yielded) { > > core.getUpdateHandler().getCommitLock(); > > } > > } > > try { > > if (indexWriter != null) { > > if (!rollback) { > > try { > > > > > > > > Stack trace for the deadlock bits, full file attached: > > > > Found one Java-level deadlock: > > ============================= > > "commitScheduler-36850-thread-1": > > waiting to lock monitor 7fc2c625f7f8 (object 7429863a0, a > org.apache.solr.update.DefaultSolrCoreState), > > which is held by "qtp132616134-30" > > "qtp132616134-30": > > waiting for ownable synchronizer 740bc1108, (a > java.util.concurrent.locks.ReentrantLock$NonfairSync), > > which is held by "commitScheduler-36850-thread-1" > > > > Java stack information for the threads listed above: > > =================================================== > > "commitScheduler-36850-thread-1": > > at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78) > > - waiting to lock <7429863a0> (a > org.apache.solr.update.DefaultSolrCoreState) > > at > org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359) > > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561) > > - locked <74066b838> (a java.lang.Object) > > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:680) > > "qtp132616134-30": > > at sun.misc.Unsafe.park(Native Method) > > - parking to wait for <740bc1108> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) > > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) > > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > > at > org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668) > > at > org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64) > > - locked <7429863a0> (a > org.apache.solr.update.DefaultSolrCoreState) > > at > org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:280) > > - locked <7429863a0> (a > org.apache.solr.update.DefaultSolrCoreState) > > at > org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888) > > - locked <7429863a0> (a > org.apache.solr.update.DefaultSolrCoreState) > > at org.apache.solr.core.SolrCore.close(SolrCore.java:980) > > at > org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1466) > > at > org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730) > > at > org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1138) > > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190) > > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) > > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) > > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) > > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:365) > > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) > > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) > > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) > > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) > > at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) > > at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > > at java.lang.Thread.run(Thread.java:680) > > > > Found 1 deadlock. > > > > > > > > On Wed, Jan 30, 2013 at 7:11 AM, Erick Erickson > wrote: > > bq: I don't follow this at all - of course you could rapidly load and > unload cores at the same time before this patch? > > > > Not quite what I was trying to say. The stress test opens and closes > cores a LOT. Of course you could open and close cores simultaneously > before. In fact given what I think is a new pattern I'm amazed that there > aren't a lot more problems, that's some damn good code. What's new is the > stress test opens and closes a core with every call. From 30 threads, 15 > indexing and 15 querying. > > > > bq: If you cannot easily produce a test that causes deadlock without > your patch > > > > I don't know how you'd really write a predictable junit test, the time > window for the race here is small. I had to run my stress test for 20-30 > minutes to hit it. I think I can modify the stress test to use old-style > solr.xml which I could run against current trunk, is it worth it though > after your second look? Note that this open/closing a core every call from > the stress test wasn't really possible the same way before SOLR-1028, one > of the keys is that the transient cores have to be aged out. The bolds > below are new code (SOLR-1028+): > > > > at > org.apache.solr.core.CoreMaps$1.removeEldestEntry(CoreContainer.java:1384) > > ...... > > at > org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1444) > > > > > > If you still think it's worthwhile, I have some travel time tomorrow > that I could use to make this test run with old-style solr.xml. Let me > know. I'm also wondering if the stress test should go into our test suite > somewhere. It's possible to bring the stress test into junit I think, > there's nothing magic about it. But it might be better as an external test > that we run on, say, a nightly or weekly basis, is there precedent? > > > > bq: Because they are different locks protecting different state. > > > > So you're saying that synchronizing the method actually is protecting > the SolrCore that's passed as a parameter? Otherwise I don't get it, seems > like moving the synchronized block to the first line of, e.g., > newIndexWriter and removing synchronized from the method signature would > be sufficient. That said, my hack of removing the synchronized from the > method signature was more to poke that what I thought I saw than a well > thought-out solution... Which is why I'm glad you're looking too... > > > > Yes, the code attached to the JIRA is the latest. You've got enough on > your plate I'm sure, I'll apply your patch and let you know. I had a little > trouble with SVN applying it cleanly, but I think I reconciled it > correctly... > > > > > > On Tue, Jan 29, 2013 at 11:43 PM, Mark Miller > wrote: > > Do you have your latest work attached to the issue? If so, I'll start > working with it locally. > > > > For now, can you try this experimental, test patch and see what the > results are? > > > > Index: > solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java > > =================================================================== > > --- solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java > (revision 1440275) > > +++ solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java > (working copy) > > @@ -135,13 +135,24 @@ > > pauseWriter = true; > > // then lets wait until its out of use > > log.info("Waiting until IndexWriter is unused... core=" + > coreName); > > + > > + boolean yielded = false; > > + try { > > while (!writerFree) { > > - try { > > - writerPauseLock.wait(100); > > - } catch (InterruptedException e) {} > > - > > - if (closed) { > > - throw new RuntimeException("SolrCoreState already closed"); > > + // yield the commit lock > > + core.getUpdateHandler().yieldCommitLock(); > > + yielded = true; > > + try { > > + writerPauseLock.wait(100); > > + } catch (InterruptedException e) {} > > + > > + if (closed) { > > + throw new RuntimeException("SolrCoreState already closed"); > > + } > > + } > > + } finally { > > + if (yielded) { > > + core.getUpdateHandler().getCommitLock(); > > } > > } > > > > Index: solr/core/src/java/org/apache/solr/update/UpdateHandler.java > > =================================================================== > > --- solr/core/src/java/org/apache/solr/update/UpdateHandler.java > (revision 1440275) > > +++ solr/core/src/java/org/apache/solr/update/UpdateHandler.java > (working copy) > > @@ -189,4 +189,10 @@ > > } > > > > public abstract void split(SplitIndexCommand cmd) throws IOException; > > + > > + > > + public void getCommitLock() {} > > + > > + > > + public void yieldCommitLock() {} > > } > > Index: > solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java > > =================================================================== > > --- solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java > (revision 1440275) > > +++ solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java > (working copy) > > @@ -830,4 +830,13 @@ > > public CommitTracker getSoftCommitTracker() { > > return softCommitTracker; > > } > > + > > + public void getCommitLock() { > > + commitLock.lock(); > > + } > > + > > + > > + public void yieldCommitLock() { > > + commitLock.unlock(); > > + } > > } > > > > > > On Jan 29, 2013, at 11:24 PM, Mark Miller wrote: > > > > > Digging into the stack traces... > > > > > > This shows a thread waiting for the commit lock trying to close an > index writer. > > > > > > There is another thread with the commit lock that is waiting for the > writer to be returned. > > > > > > That seems to be the situation - a race around the commit lock. > > > > > > Needs some thought. > > > > > > - Mark > > > > > > On Jan 29, 2013, at 8:31 AM, Erick Erickson > wrote: > > > > > >> Java stack information for the threads listed above: > > >> =================================================== > > >> "commitScheduler-42617-thread-1": > > >> at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78) > > >> - waiting to lock <78b4aa518> (a > org.apache.solr.update.DefaultSolrCoreState) > > >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359) > > >> at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561) > > >> - locked <7884ca730> (a java.lang.Object) > > >> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > > >> at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > >> at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > > >> at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > > >> at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > >> at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > >> at java.lang.Thread.run(Thread.java:680) > > >> > > >> ********* > > >> Other thread > > >> "qtp1401888126-32": > > >> at sun.misc.Unsafe.park(Native Method) > > >> - parking to wait for <788d73208> (a > > >> java.util.concurrent.locks.ReentrantLock$NonfairSync) > > >> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) > > >> at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > > >> at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) > > >> at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) > > >> at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > > >> at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > > >> at > org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668) > > >> at > org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64) > > >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState) > > >> at > org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272) > > >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState) > > >> at > org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888) > > >> - locked <78b4aa518> (a org.apache.solr.update.DefaultSolrCoreState) > > >> at org.apache.solr.core.SolrCore.close(SolrCore.java:980) > > >> at > org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1465) > > >> at > org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730) > > >> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1137) > > >> at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190) > > >> at > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > > For additional commands, e-mail: dev-help@lucene.apache.org > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --047d7bacb8f048007904d4b441d5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
First, about the thread interrupt exceptions; what do you = think about not logging them? I could argue that if they are benign, report= ing them adds unnecessary stress. I kinda figured they were harmless but th= ought it might be worth mentioning.

Second, I re-worked the stress test program to use the old-s= tyle solr.xml file then re-ran the tests from trunk w/o any of the changes = for SOLR-4196. I worked for a couple of hours then I had to stop, but tonig= ht it ran for just a few minutes (I updated the code this morning) and got = the same error (stack below just in case I'm imagining things). Next st= ep I guess is I'll apply the changes you indicated above to trunk and s= ee if it I can make it happen again. That said, it's a bit of apples-to= -oranges but worth doing nonetheless... It's still clearly happening fr= om some relatively new code related to the transient core thing given the t= race is coming form removeEldestEntry eventually....



Found one Java-level dead= lock:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
"commitScheduler-15616-thread= -1":
=A0 waiting to lock monitor 7f920387fd58 (object 7879df= 120, a org.apache.solr.update.DefaultSolrCoreState),
=A0 which is held by "qtp1490642445-15"
"qtp1= 490642445-15":
=A0 waiting to lock monitor 7f9204803bc0 (obj= ect 786d3df78, a java.lang.Object),
=A0 which is held by "co= mmitScheduler-15616-thread-1"

Java stack information for the threads listed above:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D
"commitScheduler-15616-thread-1":
at org.apache.solr.upd= ate.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
- waiting to lock &= lt;7879df120> (a org.apache.solr.update.DefaultSolrCoreState)
= at org.apache.solr.core.= SolrCore.openNewSearcher(SolrCore.java:1359)
at org.apache.solr.= core.SolrCore.getSearcher(SolrCore.java:1535)
at org.apache.solr.update.DirectUpdateHandl= er2.commit(DirectUpdateHandler2.java:550)
- locked <786d3d= f78> (a java.lang.Object)
at org.apache.solr.update.CommitTracker.run(CommitTracker.ja= va:216)
at java.util.concur= rent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTas= k$Sync.innerRun(FutureTask.java:303)
at java.util.concur= rent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExe= cutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concur= rent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoo= lExecutor.java:206)
<= /span>at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolE= xecutor.java:886)
at java.util.concur= rent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
<= span class=3D"" style=3D"white-space:pre"> at java.lang.Thread.run(T= hread.java:680)
"qtp1490642445-15":
at org.apache.solr.update.DirectUpdateHandler2.closeWr= iter(DirectUpdateHandler2.java:705)
- waiting to lock <786d3df78> (a java.lang.Obje= ct)
at org.apache.solr.= update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)<= /div>
- locked <7= 879df120> (a org.apache.solr.update.DefaultSolrCoreState)
at org.apache.solr.= update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272)
= - locked <7879df120&g= t; (a org.apache.solr.update.DefaultSolrCoreState)
at org.apache.solr.= core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
- locked <7879df120> (a org.a= pache.solr.update.DefaultSolrCoreState)
at org.apache.solr.= core.SolrCore.close(SolrCore.java:980)
at org.apache.solr.core.CoreContainer$2.removeElde= stEntry(CoreContainer.java:385)
at java.util.Linked= HashMap.addEntry(LinkedHashMap.java:410)
at java.util.HashMap.put(HashMap.java:385)
at org.apache.solr.core.= CoreContainer.registerCore(CoreContainer.java:864)
- locked <785614df8> (a org.apac= he.solr.core.CoreContainer$2)
at org.apache.solr.= core.CoreContainer.registerLazyCore(CoreContainer.java:829)
at org.apache.solr.core.CoreC= ontainer.getCore(CoreContainer.java:1321)
at org.apache.solr.= servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
= at org.eclipse.jetty.ser= vlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at org.eclipse.jett= y.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at org.eclipse.jetty.server.ha= ndler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jett= y.security.SecurityHandler.handle(SecurityHandler.java:560)
at org.eclipse.jetty.server.s= ession.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jett= y.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.s= ervlet.ServletHandler.doScope(ServletHandler.java:382)
at org.eclipse.jett= y.server.session.SessionHandler.doScope(SessionHandler.java:193)
= at org.eclipse.jetty.ser= ver.handler.ContextHandler.doScope(ContextHandler.java:1006)
at org.eclipse.jett= y.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server= .handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)=
at org.eclipse.jett= y.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
=
at org.eclipse.jett= y.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jett= y.server.Server.handle(Server.java:365)
at org.eclipse.jetty.server.AbstractHttpConnectio= n.handleRequest(AbstractHttpConnection.java:485)
at org.eclipse.jett= y.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:5= 3)
at org.ecli= pse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnecti= on.java:926)
at org.eclipse.jett= y.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpC= onnection.java:988)
<= /span>at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jett= y.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.Blockin= gHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jett= y.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264= )
at org.eclip= se.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jett= y.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
<= span class=3D"" style=3D"white-space:pre"> at java.lang.Thread.run(T= hread.java:680)

Found 1 deadlock.

=A0=A0
=


On Wed,= Jan 30, 2013 at 11:06 AM, Mark Miller <markrmiller@gmail.com><= /span> wrote:

Without evidence of further damage, that is fine - it just means a thread w= as interrupted.

- Mark

On Jan 30, 2013, at 10:52 AM, Erick Erickson <erickerickson@gmail.com> wrote:

> FWIW, by the way, I'm getting some exceptions in the solr log, her= e are the two patterns I see on a quick look (this is with my hacks, but th= e test of your patch also produced some like this, I think they were the sa= me). Your comment about trying this with the old-style XML is getting more = compelling....
>
> But do note that the stress tests still ran to completion OK, which me= ans that all documents sent to the server made it into the indexes....
>
> Jan 30, 2013 10:20:27 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.InterruptedException
> =A0 =A0 =A0 at java.util.concurrent.locks.AbstractQueuedSynchronizer.a= cquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279)
> =A0 =A0 =A0 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTas= k.java:218)
> =A0 =A0 =A0 at java.util.concurrent.FutureTask.get(FutureTask.java:83)=
> =A0 =A0 =A0 at org.apache.solr.update.DirectUpdateHandler2.commit(Dire= ctUpdateHandler2.java:597)
> =A0 =A0 =A0 at org.apache.solr.update.CommitTracker.run(CommitTracker.= java:216)
> =A0 =A0 =A0 at java.util.concurrent.Executors$RunnableAdapter.call(Exe= cutors.java:439)
> =A0 =A0 =A0 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTas= k.java:303)
> =A0 =A0 =A0 at java.util.concurrent.FutureTask.run(FutureTask.java:138= )
> =A0 =A0 =A0 at java.util.concurrent.ScheduledThreadPoolExecutor$Schedu= ledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> =A0 =A0 =A0 at java.util.concurrent.ScheduledThreadPoolExecutor$Schedu= ledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> =A0 =A0 =A0 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(= ThreadPoolExecutor.java:886)
> =A0 =A0 =A0 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Thre= adPoolExecutor.java:908)
> =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:680)
>
>
> FWIW
>
>
> On Wed, Jan 30, 2013 at 8:46 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> No joy. It ran to completion on one of my machines for an hour, but no= t the other, stack traces below.
>
> About running without the patch. The other thing that's different = is I've changed where the core.close happens in the patch as opposed to= the old code, which may confuse things. OTOH, it'd be interesting to t= ry since my changes to where I called close were based on a faulty assumpti= on about where the lock was occurring. I did a quick hack that I have yet t= o test making the stress tester bang up old-style solr.xml setups, but have= n't run it yet. I can give it a try against an unmodified trunk if you = think that would generate useful information but I sadly fear it's an a= pples/oranges comparison.
>
> I'm on IM or we can voice chat if you want to strategize, but I wo= n't be able to devote much time to this until tonight or tomorrow. I ca= n apply patches and run tests all day though....
>
> Although before digging to deeply, I had to cobble the patch into Defa= ultSolrCoreState and I might have screwed it up. Is the finally block =A0co= ntaining if (yielded) in the right place?
>
> Here's my patched code, but maybe you could just send me the whole= file? I haven't changed it outside this patch....
>
> =A0 =A0 synchronized (writerPauseLock) {
> =A0 =A0 =A0 // we need to wait for the Writer to fall out of use
> =A0 =A0 =A0 // first lets stop it from being lent out
> =A0 =A0 =A0 pauseWriter =3D true;
> =A0 =A0 =A0 // then lets wait until its out of use
> =A0 =A0 =A0 log.info= ("Waiting until IndexWriter is unused... core=3D" + coreName); > =A0 =A0 =A0 boolean yielded =3D false;
> =A0 =A0 =A0 try {
> =A0 =A0 =A0 =A0 while (!writerFree) {
> =A0 =A0 =A0 =A0 =A0 // yield the commit lock
> =A0 =A0 =A0 =A0 =A0 core.getUpdateHandler().yieldCommitLock();
> =A0 =A0 =A0 =A0 =A0 yielded =3D true;
> =A0 =A0 =A0 =A0 =A0 try {
> =A0 =A0 =A0 =A0 =A0 =A0 writerPauseLock.wait(100);
> =A0 =A0 =A0 =A0 =A0 } catch (InterruptedException e) {}
> =A0 =A0 =A0 =A0 =A0 if (closed) {
> =A0 =A0 =A0 =A0 =A0 =A0 throw new RuntimeException("SolrCoreState= already closed");
> =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 } finally {
> =A0 =A0 =A0 =A0 if (yielded) {
> =A0 =A0 =A0 =A0 core.getUpdateHandler().getCommitLock();
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 }
> =A0 =A0 =A0 try {
> =A0 =A0 =A0 =A0 if (indexWriter !=3D null) {
> =A0 =A0 =A0 =A0 =A0 if (!rollback) {
> =A0 =A0 =A0 =A0 =A0 =A0 try {
>
>
>
> Stack trace for the deadlock bits, full file attached:
>
> Found one Java-level deadlock:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D
> "commitScheduler-36850-thread-1":
> =A0 waiting to lock monitor 7fc2c625f7f8 (object 7429863a0, a org.apac= he.solr.update.DefaultSolrCoreState),
> =A0 which is held by "qtp132616134-30"
> "qtp132616134-30":
> =A0 waiting for ownable synchronizer 740bc1108, (a java.util.concurren= t.locks.ReentrantLock$NonfairSync),
> =A0 which is held by "commitScheduler-36850-thread-1"
>
> Java stack information for the threads listed above:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D
> "commitScheduler-36850-thread-1":
> =A0 =A0 =A0 at org.apache.solr.update.DefaultSolrCoreState.getIndexWri= ter(DefaultSolrCoreState.java:78)
> =A0 =A0 =A0 - waiting to lock <7429863a0> (a org.apache.solr.upd= ate.DefaultSolrCoreState)
> =A0 =A0 =A0 at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.= java:1359)
> =A0 =A0 =A0 at org.apache.solr.update.DirectUpdateHandler2.commit(Dire= ctUpdateHandler2.java:561)
> =A0 =A0 =A0 - locked <74066b838> (a java.lang.Object)
> =A0 =A0 =A0 at org.apache.solr.update.CommitTracker.run(CommitTracker.= java:216)
> =A0 =A0 =A0 at java.util.concurrent.Executors$RunnableAdapter.call(Exe= cutors.java:439)
> =A0 =A0 =A0 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTas= k.java:303)
> =A0 =A0 =A0 at java.util.concurrent.FutureTask.run(FutureTask.java:138= )
> =A0 =A0 =A0 at java.util.concurrent.ScheduledThreadPoolExecutor$Schedu= ledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> =A0 =A0 =A0 at java.util.concurrent.ScheduledThreadPoolExecutor$Schedu= ledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> =A0 =A0 =A0 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(= ThreadPoolExecutor.java:886)
> =A0 =A0 =A0 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Thre= adPoolExecutor.java:908)
> =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:680)
> "qtp132616134-30":
> =A0 =A0 =A0 at sun.misc.Unsafe.park(Native Method)
> =A0 =A0 =A0 - parking to wait for =A0<740bc1108> (a java.util.co= ncurrent.locks.ReentrantLock$NonfairSync)
> =A0 =A0 =A0 at java.util.concurrent.locks.LockSupport.park(LockSupport= .java:156)
> =A0 =A0 =A0 at java.util.concurrent.locks.AbstractQueuedSynchronizer.p= arkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> =A0 =A0 =A0 at java.util.concurrent.locks.AbstractQueuedSynchronizer.a= cquireQueued(AbstractQueuedSynchronizer.java:842)
> =A0 =A0 =A0 at java.util.concurrent.locks.AbstractQueuedSynchronizer.a= cquire(AbstractQueuedSynchronizer.java:1178)
> =A0 =A0 =A0 at java.util.concurrent.locks.ReentrantLock$NonfairSync.lo= ck(ReentrantLock.java:186)
> =A0 =A0 =A0 at java.util.concurrent.locks.ReentrantLock.lock(Reentrant= Lock.java:262)
> =A0 =A0 =A0 at org.apache.solr.update.DirectUpdateHandler2.closeWriter= (DirectUpdateHandler2.java:668)
> =A0 =A0 =A0 at org.apache.solr.update.DefaultSolrCoreState.closeIndexW= riter(DefaultSolrCoreState.java:64)
> =A0 =A0 =A0 - locked <7429863a0> (a org.apache.solr.update.Defau= ltSolrCoreState)
> =A0 =A0 =A0 at org.apache.solr.update.DefaultSolrCoreState.close(Defau= ltSolrCoreState.java:280)
> =A0 =A0 =A0 - locked <7429863a0> (a org.apache.solr.update.Defau= ltSolrCoreState)
> =A0 =A0 =A0 at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrC= ore.java:888)
> =A0 =A0 =A0 - locked <7429863a0> (a org.apache.solr.update.Defau= ltSolrCoreState)
> =A0 =A0 =A0 at org.apache.solr.core.SolrCore.close(SolrCore.java:980)<= br> > =A0 =A0 =A0 at org.apache.solr.core.CoreMaps.putTransientCore(CoreCont= ainer.java:1466)
> =A0 =A0 =A0 at org.apache.solr.core.CoreContainer.registerCore(CoreCon= tainer.java:730)
> =A0 =A0 =A0 at org.apache.solr.core.CoreContainer.getCore(CoreContaine= r.java:1138)
> =A0 =A0 =A0 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(Sol= rDispatchFilter.java:190)
> =A0 =A0 =A0 at org.eclipse.jetty.servlet.ServletHandler$CachedChain.do= Filter(ServletHandler.java:1307)
> =A0 =A0 =A0 at org.eclipse.jetty.servlet.ServletHandler.doHandle(Servl= etHandler.java:453)
> =A0 =A0 =A0 at org.eclipse.jetty.server.handler.ScopedHandler.handle(S= copedHandler.java:137)
> =A0 =A0 =A0 at org.eclipse.jetty.security.SecurityHandler.handle(Secur= ityHandler.java:560)
> =A0 =A0 =A0 at org.eclipse.jetty.server.session.SessionHandler.doHandl= e(SessionHandler.java:231)
> =A0 =A0 =A0 at org.eclipse.jetty.server.handler.ContextHandler.doHandl= e(ContextHandler.java:1072)
> =A0 =A0 =A0 at org.eclipse.jetty.servlet.ServletHandler.doScope(Servle= tHandler.java:382)
> =A0 =A0 =A0 at org.eclipse.jetty.server.session.SessionHandler.doScope= (SessionHandler.java:193)
> =A0 =A0 =A0 at org.eclipse.jetty.server.handler.ContextHandler.doScope= (ContextHandler.java:1006)
> =A0 =A0 =A0 at org.eclipse.jetty.server.handler.ScopedHandler.handle(S= copedHandler.java:135)
> =A0 =A0 =A0 at org.eclipse.jetty.server.handler.ContextHandlerCollecti= on.handle(ContextHandlerCollection.java:255)
> =A0 =A0 =A0 at org.eclipse.jetty.server.handler.HandlerCollection.hand= le(HandlerCollection.java:154)
> =A0 =A0 =A0 at org.eclipse.jetty.server.handler.HandlerWrapper.handle(= HandlerWrapper.java:116)
> =A0 =A0 =A0 at org.eclipse.jetty.server.Server.handle(Server.java:365)=
> =A0 =A0 =A0 at org.eclipse.jetty.server.AbstractHttpConnection.handleR= equest(AbstractHttpConnection.java:485)
> =A0 =A0 =A0 at org.eclipse.jetty.server.BlockingHttpConnection.handleR= equest(BlockingHttpConnection.java:53)
> =A0 =A0 =A0 at org.eclipse.jetty.server.AbstractHttpConnection.content= (AbstractHttpConnection.java:937)
> =A0 =A0 =A0 at org.eclipse.jetty.server.AbstractHttpConnection$Request= Handler.content(AbstractHttpConnection.java:998)
> =A0 =A0 =A0 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.= java:948)
> =A0 =A0 =A0 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpPa= rser.java:240)
> =A0 =A0 =A0 at org.eclipse.jetty.server.BlockingHttpConnection.handle(= BlockingHttpConnection.java:72)
> =A0 =A0 =A0 at org.eclipse.jetty.server.bio.SocketConnector$ConnectorE= ndPoint.run(SocketConnector.java:264)
> =A0 =A0 =A0 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Q= ueuedThreadPool.java:608)
> =A0 =A0 =A0 at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(Qu= euedThreadPool.java:543)
> =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:680)
>
> Found 1 deadlock.
>
>
>
> On Wed, Jan 30, 2013 at 7:11 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> bq: I don't follow this at all - of course you could rapidly load = and unload cores at the same time before this patch?
>
> Not quite what I was trying to say. The stress test opens and closes c= ores a LOT. Of course you could open and close cores simultaneously before.= In fact given what I think is a new pattern I'm amazed that there aren= 't a lot more problems, that's some damn good code. What's new = is the stress test opens and closes a core with every call. From 30 threads= , 15 indexing and 15 querying.
>
> bq: If you cannot easily produce a test that causes deadlock without y= our patch
>
> I don't know how you'd really write a predictable junit test, = the time window for the race here is small. I had to run my stress test for= 20-30 minutes to hit it. I think I can modify the stress test to use old-s= tyle solr.xml which I could run against current trunk, is it worth it thoug= h after your second look? Note that this open/closing a core every call fro= m the stress test wasn't really possible the same way before SOLR-1028,= one of the keys is that the transient cores have to be aged out. The bolds= below are new code (SOLR-1028+):
>
> at org.apache.solr.core.CoreMaps$1.removeEldestEntry(CoreContainer.jav= a:1384)
> ......
> at org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1= 444)
>
>
> If you still think it's worthwhile, I have some travel time tomorr= ow that I could use to make this test run with old-style solr.xml. Let me k= now. I'm also wondering if the stress test should go into our test suit= e somewhere. It's possible to bring the stress test into junit I think,= there's nothing magic about it. But it might be better as an external = test that we run on, say, a nightly or weekly basis, is there precedent? >
> bq: Because they are different locks protecting different state.
>
> So you're saying that synchronizing the method actually is protect= ing the SolrCore that's passed as a parameter? Otherwise I don't ge= t it, seems like moving the synchronized block to the first line of, e.g., = =A0newIndexWriter and removing synchronized from the method signature would= be sufficient. That said, my hack of removing the synchronized from the me= thod signature was more to poke that what I thought I saw than a well thoug= ht-out solution... Which is why I'm glad you're looking too...
>
> Yes, the code attached to the JIRA is the latest. You've got enoug= h on your plate I'm sure, I'll apply your patch and let you know. I= had a little trouble with SVN applying it cleanly, but =A0I think I reconc= iled it correctly...
>
>
> On Tue, Jan 29, 2013 at 11:43 PM, Mark Miller <markrmiller@gmail.com> wrote:
> Do you have your latest work attached to the issue? If so, I'll st= art working with it locally.
>
> For now, can you try this experimental, test patch and see what the re= sults are?
>
> Index: solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.= java
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.jav= a (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.jav= a (working copy)
> @@ -135,13 +135,24 @@
> =A0 =A0 =A0 =A0pauseWriter =3D true;
> =A0 =A0 =A0 =A0// then lets wait until its out of use
> =A0 =A0 =A0 =A0log.info<= /a>("Waiting until IndexWriter is unused... core=3D" + coreName);=
> +
> + =A0 =A0 =A0boolean yielded =3D false;
> + =A0 =A0 =A0try {
> =A0 =A0 =A0 =A0while (!writerFree) {
> - =A0 =A0 =A0 =A0try {
> - =A0 =A0 =A0 =A0 =A0writerPauseLock.wait(100);
> - =A0 =A0 =A0 =A0} catch (InterruptedException e) {}
> -
> - =A0 =A0 =A0 =A0if (closed) {
> - =A0 =A0 =A0 =A0 =A0throw new RuntimeException("SolrCoreState al= ready closed");
> + =A0 =A0 =A0 =A0 =A0// yield the commit lock
> + =A0 =A0 =A0 =A0 =A0core.getUpdateHandler().yieldCommitLock();
> + =A0 =A0 =A0 =A0 =A0yielded =3D true;
> + =A0 =A0 =A0 =A0 =A0try {
> + =A0 =A0 =A0 =A0 =A0 =A0writerPauseLock.wait(100);
> + =A0 =A0 =A0 =A0 =A0} catch (InterruptedException e) {}
> +
> + =A0 =A0 =A0 =A0 =A0if (closed) {
> + =A0 =A0 =A0 =A0 =A0 =A0throw new RuntimeException("SolrCoreStat= e already closed");
> + =A0 =A0 =A0 =A0 =A0}
> + =A0 =A0 =A0 =A0}
> + =A0 =A0 =A0} finally {
> + =A0 =A0 =A0 =A0if (yielded) {
> + =A0 =A0 =A0 =A0 =A0core.getUpdateHandler().getCommitLock();
> =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0}
>
> Index: solr/core/src/java/org/apache/solr/update/UpdateHandler.java > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- solr/core/src/java/org/apache/solr/update/UpdateHandler.java =A0 = =A0 =A0 =A0(revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/UpdateHandler.java =A0 = =A0 =A0 =A0(working copy)
> @@ -189,4 +189,10 @@
> =A0 =A0}
>
> =A0 =A0public abstract void split(SplitIndexCommand cmd) throws IOExce= ption;
> +
> +
> + =A0public void getCommitLock() {}
> +
> +
> + =A0public void yieldCommitLock() {}
> =A0}
> Index: solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.= java
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.jav= a (revision 1440275)
> +++ solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.jav= a (working copy)
> @@ -830,4 +830,13 @@
> =A0 =A0public CommitTracker getSoftCommitTracker() {
> =A0 =A0 =A0return softCommitTracker;
> =A0 =A0}
> +
> + =A0public void getCommitLock() {
> + =A0 =A0commitLock.lock();
> + =A0}
> +
> +
> + =A0public void yieldCommitLock() {
> + =A0 =A0commitLock.unlock();
> + =A0}
> =A0}
>
>
> On Jan 29, 2013, at 11:24 PM, Mark Miller <
markrmiller@gmail.com> wrote:
>
> > Digging into the stack traces...
> >
> > This shows a thread waiting for the commit lock trying to close a= n index writer.
> >
> > There is another thread with the commit lock that is waiting for = the writer to be returned.
> >
> > That seems to be the situation - a race around the commit lock. > >
> > Needs some thought.
> >
> > - Mark
> >
> > On Jan 29, 2013, at 8:31 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> >> Java stack information for the threads listed above:
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D
> >> "commitScheduler-42617-thread-1":
> >> at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter= (DefaultSolrCoreState.java:78)
> >> - waiting to lock <78b4aa518> (a org.apache.solr.update= .DefaultSolrCoreState)
> >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.jav= a:1359)
> >> at org.apache.solr.update.DirectUpdateHandler2.commit(DirectU= pdateHandler2.java:561)
> >> - locked <7884ca730> (a java.lang.Object)
> >> at org.apache.solr.update.CommitTracker.run(CommitTracker.jav= a:216)
> >> at java.util.concurrent.Executors$RunnableAdapter.call(Execut= ors.java:439)
> >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.j= ava:303)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >> at java.util.concurrent.ScheduledThreadPoolExecutor$Scheduled= FutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> >> at java.util.concurrent.ScheduledThreadPoolExecutor$Scheduled= FutureTask.run(ScheduledThreadPoolExecutor.java:206)
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Thr= eadPoolExecutor.java:886)
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadP= oolExecutor.java:908)
> >> at java.lang.Thread.run(Thread.java:680)
> >>
> >> *********
> >> Other thread
> >> "qtp1401888126-32":
> >> at sun.misc.Unsafe.park(Native Method)
> >> - parking to wait for =A0<788d73208> (a
> >> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> >> at java.util.concurrent.locks.LockSupport.park(LockSupport.ja= va:156)
> >> at java.util.concurrent.locks.AbstractQueuedSynchronizer.park= AndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
> >> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acqu= ireQueued(AbstractQueuedSynchronizer.java:842)
> >> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acqu= ire(AbstractQueuedSynchronizer.java:1178)
> >> at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(= ReentrantLock.java:186)
> >> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLoc= k.java:262)
> >> at org.apache.solr.update.DirectUpdateHandler2.closeWriter(Di= rectUpdateHandler2.java:668)
> >> at org.apache.solr.update.DefaultSolrCoreState.closeIndexWrit= er(DefaultSolrCoreState.java:64)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultS= olrCoreState)
> >> at org.apache.solr.update.DefaultSolrCoreState.close(DefaultS= olrCoreState.java:272)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultS= olrCoreState)
> >> at org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore= .java:888)
> >> - locked <78b4aa518> (a org.apache.solr.update.DefaultS= olrCoreState)
> >> at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
> >> at org.apache.solr.core.CoreMaps.putTransientCore(CoreContain= er.java:1465)
> >> at org.apache.solr.core.CoreContainer.registerCore(CoreContai= ner.java:730)
> >> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.j= ava:1137)
> >> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDi= spatchFilter.java:190)
> >> at
> >
>
>
> ---------------------------------------------------------------------<= br> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


--047d7bacb8f048007904d4b441d5--