lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Stress test deadlocks
Date Sat, 02 Feb 2013 22:36:30 GMT
OK, had a chance to untangle myself on the flight. Long and short of it is
that the problem is still there on trunk (sans-4196 but with the changes
above).

Looking at the stack trace some more, I wonder if the issue isn't in
deleteLock (part of SolrCoreState.getUpdateLock, but I won't be able to
look at it in any detail until tomorrow. One thread has a lock on that and
the other is waiting on it, but I don't know yet what other resource is
locked up.

FWIW,
Erick


On Sat, Feb 2, 2013 at 10:29 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> 1> opened a JIRA, see: https://issues.apache.org/jira/browse/SOLR-4398
>
> 2> Ran the test against trunk last night, and things locked up. But when I
> tried to trace back the jstack output against the code I thought I was
> running, I couldn't get the source to line up so I don't really have
> anything to report yet. I won't have a chance to run this again until I get
> home this evening.
>
> If you're really hot to try it yourself in your copious spare time (I know
> you have a lot <G>), the source for the stress test is attached to the
> jira. All you have to do is compile it, fire it up with -w -x, I usually
> give it some room like
>
> java -Xmx1G -jar StresTest.jar -x -w -m=###
> the -m bit is how many minutes to run (default 60). -w means create a
> bunch of cores in <user home>/tmp/cores. -x means "use old style xml".
>
> It'll churn for a short time and give you a "start solr sever and hit
> enter' message, at which point I start Solr with solr home pointing to the
> right place, e.g:
> java -Xmx2G -Dsolr.solr.home=/Users/Erick/tmp/cores -jar start.jar
>
> Then hit enter in the stress test program. There are confirmation messages
> printed out every minute that the stress test is running, I usually just
> ignore things and occasionally check to see if there have been any
> confirmation messages lately.
>
> This all runs against trunk if you use the -x option. My stray thought was
> I wonder if it'd make sense to add this to some kind of regular test
> process just for yucks....
>
> Like I said, though, I'm more than happy to be the stress-test guinea pig,
> the above is strictly if you think it would save you time.
>
> My next task is to get it to fail again and make more sense of the stack
> traces. I'll let you know when I have something more.
>
> Erick
>
>
> On Fri, Feb 1, 2013 at 9:23 PM, Mark Miller <markrmiller@gmail.com> wrote:
>
>>
>> On Feb 1, 2013, at 9:02 PM, Erick Erickson <erickerickson@gmail.com>
>> wrote:
>>
>> > First, about the thread interrupt exceptions; what do you think about
>> not logging them? I could argue that if they are benign, reporting them
>> adds unnecessary stress. I kinda figured they were harmless but thought it
>> might be worth mentioning.
>>
>> I guess I'd open a JIRA issue to discuss it - we would probably want to
>> consistently tackle the code base. Currently, I think we usually log
>> something on interruptions.
>>
>> >
>> > Second, I re-worked the stress test program to use the old-style
>> solr.xml file then re-ran the tests from trunk w/o any of the changes for
>> SOLR-4196. I worked for a couple of hours then I had to stop, but tonight
>> it ran for just a few minutes (I updated the code this morning) and got the
>> same error (stack below just in case I'm imagining things). Next step I
>> guess is I'll apply the changes you indicated above to trunk and see if it
>> I can make it happen again. That said, it's a bit of apples-to-oranges but
>> worth doing nonetheless... It's still clearly happening from some
>> relatively new code related to the transient core thing given the trace is
>> coming form removeEldestEntry eventually….
>>
>> Since you have the tests and can easily check this, I would appreciate
>> it. We would like to fix this.
>>
>> The below is the same issue. I'm pretty sure the patch addresses it
>> (though don't commit it, it's still hackey), but confirmation would be
>> great.
>>
>> - Mark
>>
>> >
>> >
>> >
>> > Found one Java-level deadlock:
>> > =============================
>> > "commitScheduler-15616-thread-1":
>> >   waiting to lock monitor 7f920387fd58 (object 7879df120, a
>> org.apache.solr.update.DefaultSolrCoreState),
>> >   which is held by "qtp1490642445-15"
>> > "qtp1490642445-15":
>> >   waiting to lock monitor 7f9204803bc0 (object 786d3df78, a
>> java.lang.Object),
>> >   which is held by "commitScheduler-15616-thread-1"
>> >
>> > Java stack information for the threads listed above:
>> > ===================================================
>> > "commitScheduler-15616-thread-1":
>> >       at
>> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
>> >       - waiting to lock <7879df120> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> >       at
>> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359)
>> >       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)
>> >       at
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:550)
>> >       - locked <786d3df78> (a java.lang.Object)
>> >       at
>> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> >       at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> >       at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >       at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>> >       at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>> >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >       at java.lang.Thread.run(Thread.java:680)
>> > "qtp1490642445-15":
>> >       at
>> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:705)
>> >       - waiting to lock <786d3df78> (a java.lang.Object)
>> >       at
>> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
>> >       - locked <7879df120> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> >       at
>> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272)
>> >       - locked <7879df120> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> >       at
>> org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
>> >       - locked <7879df120> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> >       at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
>> >       at
>> org.apache.solr.core.CoreContainer$2.removeEldestEntry(CoreContainer.java:385)
>> >       at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:410)
>> >       at java.util.HashMap.put(HashMap.java:385)
>> >       at
>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:864)
>> >       - locked <785614df8> (a org.apache.solr.core.CoreContainer$2)
>> >       at
>> org.apache.solr.core.CoreContainer.registerLazyCore(CoreContainer.java:829)
>> >       at
>> org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1321)
>> >       at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
>> >       at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>> >       at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>> >       at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> >       at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>> >       at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> >       at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>> >       at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>> >       at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> >       at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>> >       at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> >       at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> >       at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> >       at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> >       at org.eclipse.jetty.server.Server.handle(Server.java:365)
>> >       at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>> >       at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> >       at
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
>> >       at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
>> >       at
>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
>> >       at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> >       at
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> >       at
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> >       at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> >       at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> >       at java.lang.Thread.run(Thread.java:680)
>> >
>> > Found 1 deadlock.
>> >
>> >
>> >
>> >
>> > On Wed, Jan 30, 2013 at 11:06 AM, Mark Miller <markrmiller@gmail.com>
>> wrote:
>> >
>> > Without evidence of further damage, that is fine - it just means a
>> thread was interrupted.
>> >
>> > - Mark
>> >
>> > On Jan 30, 2013, at 10:52 AM, Erick Erickson <erickerickson@gmail.com>
>> wrote:
>> >
>> > > FWIW, by the way, I'm getting some exceptions in the solr log, here
>> are the two patterns I see on a quick look (this is with my hacks, but the
>> test of your patch also produced some like this, I think they were the
>> same). Your comment about trying this with the old-style XML is getting
>> more compelling....
>> > >
>> > > But do note that the stress tests still ran to completion OK, which
>> means that all documents sent to the server made it into the indexes....
>> > >
>> > > Jan 30, 2013 10:20:27 AM org.apache.solr.common.SolrException log
>> > > SEVERE: java.lang.InterruptedException
>> > >       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279)
>> > >       at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>> > >       at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> > >       at
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:597)
>> > >       at
>> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> > >       at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > >       at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > >       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > >       at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>> > >       at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>> > >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > >       at java.lang.Thread.run(Thread.java:680)
>> > >
>> > >
>> > > FWIW
>> > >
>> > >
>> > > On Wed, Jan 30, 2013 at 8:46 AM, Erick Erickson <
>> erickerickson@gmail.com> wrote:
>> > > No joy. It ran to completion on one of my machines for an hour, but
>> not the other, stack traces below.
>> > >
>> > > About running without the patch. The other thing that's different is
>> I've changed where the core.close happens in the patch as opposed to the
>> old code, which may confuse things. OTOH, it'd be interesting to try since
>> my changes to where I called close were based on a faulty assumption about
>> where the lock was occurring. I did a quick hack that I have yet to test
>> making the stress tester bang up old-style solr.xml setups, but haven't run
>> it yet. I can give it a try against an unmodified trunk if you think that
>> would generate useful information but I sadly fear it's an apples/oranges
>> comparison.
>> > >
>> > > I'm on IM or we can voice chat if you want to strategize, but I won't
>> be able to devote much time to this until tonight or tomorrow. I can apply
>> patches and run tests all day though....
>> > >
>> > > Although before digging to deeply, I had to cobble the patch into
>> DefaultSolrCoreState and I might have screwed it up. Is the finally block
>>  containing if (yielded) in the right place?
>> > >
>> > > Here's my patched code, but maybe you could just send me the whole
>> file? I haven't changed it outside this patch....
>> > >
>> > >     synchronized (writerPauseLock) {
>> > >       // we need to wait for the Writer to fall out of use
>> > >       // first lets stop it from being lent out
>> > >       pauseWriter = true;
>> > >       // then lets wait until its out of use
>> > >       log.info("Waiting until IndexWriter is unused... core=" +
>> coreName);
>> > >       boolean yielded = false;
>> > >       try {
>> > >         while (!writerFree) {
>> > >           // yield the commit lock
>> > >           core.getUpdateHandler().yieldCommitLock();
>> > >           yielded = true;
>> > >           try {
>> > >             writerPauseLock.wait(100);
>> > >           } catch (InterruptedException e) {}
>> > >           if (closed) {
>> > >             throw new RuntimeException("SolrCoreState already
>> closed");
>> > >           }
>> > >         }
>> > >       } finally {
>> > >         if (yielded) {
>> > >         core.getUpdateHandler().getCommitLock();
>> > >         }
>> > >       }
>> > >       try {
>> > >         if (indexWriter != null) {
>> > >           if (!rollback) {
>> > >             try {
>> > >
>> > >
>> > >
>> > > Stack trace for the deadlock bits, full file attached:
>> > >
>> > > Found one Java-level deadlock:
>> > > =============================
>> > > "commitScheduler-36850-thread-1":
>> > >   waiting to lock monitor 7fc2c625f7f8 (object 7429863a0, a
>> org.apache.solr.update.DefaultSolrCoreState),
>> > >   which is held by "qtp132616134-30"
>> > > "qtp132616134-30":
>> > >   waiting for ownable synchronizer 740bc1108, (a
>> java.util.concurrent.locks.ReentrantLock$NonfairSync),
>> > >   which is held by "commitScheduler-36850-thread-1"
>> > >
>> > > Java stack information for the threads listed above:
>> > > ===================================================
>> > > "commitScheduler-36850-thread-1":
>> > >       at
>> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
>> > >       - waiting to lock <7429863a0> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > >       at
>> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359)
>> > >       at
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561)
>> > >       - locked <74066b838> (a java.lang.Object)
>> > >       at
>> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> > >       at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > >       at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > >       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > >       at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>> > >       at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>> > >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > >       at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > >       at java.lang.Thread.run(Thread.java:680)
>> > > "qtp132616134-30":
>> > >       at sun.misc.Unsafe.park(Native Method)
>> > >       - parking to wait for  <740bc1108> (a
>> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>> > >       at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>> > >       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>> > >       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>> > >       at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>> > >       at
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>> > >       at
>> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>> > >       at
>> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668)
>> > >       at
>> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
>> > >       - locked <7429863a0> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > >       at
>> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:280)
>> > >       - locked <7429863a0> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > >       at
>> org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
>> > >       - locked <7429863a0> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > >       at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
>> > >       at
>> org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1466)
>> > >       at
>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730)
>> > >       at
>> org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1138)
>> > >       at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
>> > >       at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>> > >       at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>> > >       at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> > >       at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>> > >       at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> > >       at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>> > >       at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>> > >       at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> > >       at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>> > >       at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> > >       at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> > >       at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> > >       at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> > >       at org.eclipse.jetty.server.Server.handle(Server.java:365)
>> > >       at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>> > >       at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> > >       at
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
>> > >       at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
>> > >       at
>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
>> > >       at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>> > >       at
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> > >       at
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> > >       at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> > >       at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> > >       at java.lang.Thread.run(Thread.java:680)
>> > >
>> > > Found 1 deadlock.
>> > >
>> > >
>> > >
>> > > On Wed, Jan 30, 2013 at 7:11 AM, Erick Erickson <
>> erickerickson@gmail.com> wrote:
>> > > bq: I don't follow this at all - of course you could rapidly load and
>> unload cores at the same time before this patch?
>> > >
>> > > Not quite what I was trying to say. The stress test opens and closes
>> cores a LOT. Of course you could open and close cores simultaneously
>> before. In fact given what I think is a new pattern I'm amazed that there
>> aren't a lot more problems, that's some damn good code. What's new is the
>> stress test opens and closes a core with every call. From 30 threads, 15
>> indexing and 15 querying.
>> > >
>> > > bq: If you cannot easily produce a test that causes deadlock without
>> your patch
>> > >
>> > > I don't know how you'd really write a predictable junit test, the
>> time window for the race here is small. I had to run my stress test for
>> 20-30 minutes to hit it. I think I can modify the stress test to use
>> old-style solr.xml which I could run against current trunk, is it worth it
>> though after your second look? Note that this open/closing a core every
>> call from the stress test wasn't really possible the same way before
>> SOLR-1028, one of the keys is that the transient cores have to be aged out.
>> The bolds below are new code (SOLR-1028+):
>> > >
>> > > at
>> org.apache.solr.core.CoreMaps$1.removeEldestEntry(CoreContainer.java:1384)
>> > > ......
>> > > at
>> org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1444)
>> > >
>> > >
>> > > If you still think it's worthwhile, I have some travel time tomorrow
>> that I could use to make this test run with old-style solr.xml. Let me
>> know. I'm also wondering if the stress test should go into our test suite
>> somewhere. It's possible to bring the stress test into junit I think,
>> there's nothing magic about it. But it might be better as an external test
>> that we run on, say, a nightly or weekly basis, is there precedent?
>> > >
>> > > bq: Because they are different locks protecting different state.
>> > >
>> > > So you're saying that synchronizing the method actually is protecting
>> the SolrCore that's passed as a parameter? Otherwise I don't get it, seems
>> like moving the synchronized block to the first line of, e.g.,
>>  newIndexWriter and removing synchronized from the method signature would
>> be sufficient. That said, my hack of removing the synchronized from the
>> method signature was more to poke that what I thought I saw than a well
>> thought-out solution... Which is why I'm glad you're looking too...
>> > >
>> > > Yes, the code attached to the JIRA is the latest. You've got enough
>> on your plate I'm sure, I'll apply your patch and let you know. I had a
>> little trouble with SVN applying it cleanly, but  I think I reconciled it
>> correctly...
>> > >
>> > >
>> > > On Tue, Jan 29, 2013 at 11:43 PM, Mark Miller <markrmiller@gmail.com>
>> wrote:
>> > > Do you have your latest work attached to the issue? If so, I'll start
>> working with it locally.
>> > >
>> > > For now, can you try this experimental, test patch and see what the
>> results are?
>> > >
>> > > Index:
>> solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java
>> > > ===================================================================
>> > > ---
>> solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java
>> (revision 1440275)
>> > > +++
>> solr/core/src/java/org/apache/solr/update/DefaultSolrCoreState.java
>> (working copy)
>> > > @@ -135,13 +135,24 @@
>> > >        pauseWriter = true;
>> > >        // then lets wait until its out of use
>> > >        log.info("Waiting until IndexWriter is unused... core=" +
>> coreName);
>> > > +
>> > > +      boolean yielded = false;
>> > > +      try {
>> > >        while (!writerFree) {
>> > > -        try {
>> > > -          writerPauseLock.wait(100);
>> > > -        } catch (InterruptedException e) {}
>> > > -
>> > > -        if (closed) {
>> > > -          throw new RuntimeException("SolrCoreState already closed");
>> > > +          // yield the commit lock
>> > > +          core.getUpdateHandler().yieldCommitLock();
>> > > +          yielded = true;
>> > > +          try {
>> > > +            writerPauseLock.wait(100);
>> > > +          } catch (InterruptedException e) {}
>> > > +
>> > > +          if (closed) {
>> > > +            throw new RuntimeException("SolrCoreState already
>> closed");
>> > > +          }
>> > > +        }
>> > > +      } finally {
>> > > +        if (yielded) {
>> > > +          core.getUpdateHandler().getCommitLock();
>> > >          }
>> > >        }
>> > >
>> > > Index: solr/core/src/java/org/apache/solr/update/UpdateHandler.java
>> > > ===================================================================
>> > > --- solr/core/src/java/org/apache/solr/update/UpdateHandler.java
>>    (revision 1440275)
>> > > +++ solr/core/src/java/org/apache/solr/update/UpdateHandler.java
>>    (working copy)
>> > > @@ -189,4 +189,10 @@
>> > >    }
>> > >
>> > >    public abstract void split(SplitIndexCommand cmd) throws
>> IOException;
>> > > +
>> > > +
>> > > +  public void getCommitLock() {}
>> > > +
>> > > +
>> > > +  public void yieldCommitLock() {}
>> > >  }
>> > > Index:
>> solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
>> > > ===================================================================
>> > > ---
>> solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
>> (revision 1440275)
>> > > +++
>> solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
>> (working copy)
>> > > @@ -830,4 +830,13 @@
>> > >    public CommitTracker getSoftCommitTracker() {
>> > >      return softCommitTracker;
>> > >    }
>> > > +
>> > > +  public void getCommitLock() {
>> > > +    commitLock.lock();
>> > > +  }
>> > > +
>> > > +
>> > > +  public void yieldCommitLock() {
>> > > +    commitLock.unlock();
>> > > +  }
>> > >  }
>> > >
>> > >
>> > > On Jan 29, 2013, at 11:24 PM, Mark Miller <markrmiller@gmail.com>
>> wrote:
>> > >
>> > > > Digging into the stack traces...
>> > > >
>> > > > This shows a thread waiting for the commit lock trying to close an
>> index writer.
>> > > >
>> > > > There is another thread with the commit lock that is waiting for
>> the writer to be returned.
>> > > >
>> > > > That seems to be the situation - a race around the commit lock.
>> > > >
>> > > > Needs some thought.
>> > > >
>> > > > - Mark
>> > > >
>> > > > On Jan 29, 2013, at 8:31 AM, Erick Erickson <
>> erickerickson@gmail.com> wrote:
>> > > >
>> > > >> Java stack information for the threads listed above:
>> > > >> ===================================================
>> > > >> "commitScheduler-42617-thread-1":
>> > > >> at
>> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:78)
>> > > >> - waiting to lock <78b4aa518> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > > >> at
>> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1359)
>> > > >> at
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:561)
>> > > >> - locked <7884ca730> (a java.lang.Object)
>> > > >> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> > > >> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > > >> at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > > >> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>> > > >> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>> > > >> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > > >> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > > >> at java.lang.Thread.run(Thread.java:680)
>> > > >>
>> > > >> *********
>> > > >> Other thread
>> > > >> "qtp1401888126-32":
>> > > >> at sun.misc.Unsafe.park(Native Method)
>> > > >> - parking to wait for  <788d73208> (a
>> > > >> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>> > > >> at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>> > > >> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>> > > >> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>> > > >> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>> > > >> at
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>> > > >> at
>> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>> > > >> at
>> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:668)
>> > > >> at
>> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:64)
>> > > >> - locked <78b4aa518> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > > >> at
>> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:272)
>> > > >> - locked <78b4aa518> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > > >> at
>> org.apache.solr.core.SolrCore.decrefSolrCoreState(SolrCore.java:888)
>> > > >> - locked <78b4aa518> (a
>> org.apache.solr.update.DefaultSolrCoreState)
>> > > >> at org.apache.solr.core.SolrCore.close(SolrCore.java:980)
>> > > >> at
>> org.apache.solr.core.CoreMaps.putTransientCore(CoreContainer.java:1465)
>> > > >> at
>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:730)
>> > > >> at
>> org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1137)
>> > > >> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:190)
>> > > >> at
>> > > >
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: dev-help@lucene.apache.org
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Mime
View raw message