lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Vaillancourt <...@elementspace.com>
Subject Re: SolrCloud 4.x hangs under high update volume
Date Wed, 04 Sep 2013 17:22:22 GMT
Thanks guys! :)

Mark: this patch is much appreciated, I will try to test this shortly,
hopefully today.

For my curiosity/understanding, could someone explain to me quickly what
locks SolrCloud takes on updates? Was I on to something that more shards
decrease the chance for locking?

Secondly, I was wondering if someone could summarize what this patch
'fixes'? I'm not too familiar with Java and the solr codebase (working on
that though :D).

Cheers,

Tim



On 4 September 2013 09:52, Mark Miller <markrmiller@gmail.com> wrote:

> There is an issue if I remember right, but I can't find it right now.
>
> If anyone that has the problem could try this patch, that would be very
> helpful: http://pastebin.com/raw.php?i=aaRWwSGP
>
> - Mark
>
>
> On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma <markus.jelsma@openindex.io
> >wrote:
>
> > Hi Mark,
> >
> > Got an issue to watch?
> >
> > Thanks,
> > Markus
> >
> > -----Original message-----
> > > From:Mark Miller <markrmiller@gmail.com>
> > > Sent: Wednesday 4th September 2013 16:55
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SolrCloud 4.x hangs under high update volume
> > >
> > > I'm going to try and fix the root cause for 4.5 - I've suspected what
> it
> > is since early this year, but it's never personally been an issue, so
> it's
> > rolled along for a long time.
> > >
> > > Mark
> > >
> > > Sent from my iPhone
> > >
> > > On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt <tim@elementspace.com>
> > wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I am looking into an issue we've been having with SolrCloud since the
> > > > beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
> > 4.4.0
> > > > yet). I've noticed other users with this same issue, so I'd really
> > like to
> > > > get to the bottom of it.
> > > >
> > > > Under a very, very high rate of updates (2000+/sec), after 1-12 hours
> > we
> > > > see stalled transactions that snowball to consume all Jetty threads
> in
> > the
> > > > JVM. This eventually causes the JVM to hang with most threads waiting
> > on
> > > > the condition/stack provided at the bottom of this message. At this
> > point
> > > > SolrCloud instances then start to see their neighbors (who also have
> > all
> > > > threads hung) as down w/"Connection Refused", and the shards become
> > "down"
> > > > in state. Sometimes a node or two survives and just returns 503s "no
> > server
> > > > hosting shard" errors.
> > > >
> > > > As a workaround/experiment, we have tuned the number of threads
> sending
> > > > updates to Solr, as well as the batch size (we batch updates from
> > client ->
> > > > solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> > > > Client-to-Solr batching (1 update = 1 call to Solr), which also did
> not
> > > > help. Certain combinations of update threads and batch sizes seem to
> > > > mask/help the problem, but not resolve it entirely.
> > > >
> > > > Our current environment is the following:
> > > > - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> > > > - 3 x Zookeeper instances, external Java 7 JVM.
> > > > - 1 collection, 3 shards, 2 replicas (each node is a leader of 1
> shard
> > and
> > > > a replica of 1 shard).
> > > > - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
> > good
> > > > day.
> > > > - 5000 max jetty threads (well above what we use when we are
> healthy),
> > > > Linux-user threads ulimit is 6000.
> > > > - Occurs under Jetty 8 or 9 (many versions).
> > > > - Occurs under Java 1.6 or 1.7 (several minor versions).
> > > > - Occurs under several JVM tunings.
> > > > - Everything seems to point to Solr itself, and not a Jetty or Java
> > version
> > > > (I hope I'm wrong).
> > > >
> > > > The stack trace that is holding up all my Jetty QTP threads is the
> > > > following, which seems to be waiting on a lock that I would very much
> > like
> > > > to understand further:
> > > >
> > > > "java.lang.Thread.State: WAITING (parking)
> > > >    at sun.misc.Unsafe.park(Native Method)
> > > >    - parking to wait for  <0x00000007216e68d8> (a
> > > > java.util.concurrent.Semaphore$NonfairSync)
> > > >    at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > >    at
> > > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> > > >    at
> > > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> > > >    at
> > > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> > > >    at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> > > >    at
> > > >
> >
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> > > >    at
> > > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> > > >    at
> > > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> > > >    at
> > > >
> >
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
> > > >    at
> > > >
> >
> org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
> > > >    at
> > > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
> > > >    at
> > > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
> > > >    at
> > > >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
> > > >    at
> > > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > >    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
> > > >    at
> > > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
> > > >    at
> > > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
> > > >    at
> > > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
> > > >    at
> > > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
> > > >    at
> > > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
> > > >    at
> > > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
> > > >    at
> > > >
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> > > >    at org.eclipse.jetty.server.Server.handle(Server.java:445)
> > > >    at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268)
> > > >    at
> > > >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)
> > > >    at
> > > >
> >
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
> > > >    at
> > > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
> > > >    at
> > > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
> > > >    at java.lang.Thread.run(Thread.java:724)"
> > > >
> > > > Some questions I had were:
> > > > 1) What exclusive locks does SolrCloud "make" when performing an
> > update?
> > > > 2) Keeping in mind I do not read or write java (sorry :D), could
> > someone
> > > > help me understand "what" solr is locking in this case at
> > > >
> >
> "org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)"
> > > > when performing an update? That will help me understand where to look
> > next.
> > > > 3) It seems all threads in this state are waiting for
> > "0x00000007216e68d8",
> > > > is there a way to tell what "0x00000007216e68d8" is?
> > > > 4) Is there a limit to how many updates you can do in SolrCloud?
> > > > 5) Wild-ass-theory: would more shards provide more locks (whatever
> they
> > > > are) on update, and thus more update throughput?
> > > >
> > > > To those interested, I've provided a stacktrace of 1 of 3 nodes at
> > this URL
> > > > in gzipped form:
> > > >
> >
> https://s3.amazonaws.com/timvaillancourt.com/tmp/solr-jstack-2013-08-23.gz
> > > >
> > > > Any help/suggestions/ideas on this issue, big or small, would be much
> > > > appreciated.
> > > >
> > > > Thanks so much all!
> > > >
> > > > Tim Vaillancourt
> > >
> >
>
>
>
> --
> - Mark
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message