lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Horváth Péter Gergely <peter.gergely.horv...@gmail.com>
Subject Fwd: Solr Cloud 6.0.0 hangs when creating large amount of collections and node fails to recover after restart
Date Thu, 12 May 2016 15:08:26 GMT
Hi All,

I am experimenting with Solr Cloud 6.0.0 to benchmark its performance in
our Linux environment. I have been using the basic cloud example with some
tweaks (4 nodes instead of 2; 2GB RAM instead of 512MB).

Basically, I run the cluster with the following commands:

.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8983 -s
".../solr/6.0.0/example/cloud/node1/solr"
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7574 -s
".../solr/6.0.0/example/cloud/node2/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8984 -s
".../solr/6.0.0/example/cloud/node3/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7575 -s
".../solr/6.0.0/example/cloud/node4/solr" -z localhost:9983

As part of benchmark, I attempted to create about 2500 collections to see
how well that would work for us. Unfortunately, the experiment yielded some
disappointing results, after about 2000 being created SolR got hung; REST
requests started failing. I found the following in the logs:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.1.1:8984/solr: Error CREATEing SolrCore
'FOOBAR_shard2_replica1': Unable to create core [FOOBAR_shard2_replica1]
Caused by: KeeperErrorCode = Session expired for
/configs/default/lang/contractions_it.txt
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:577)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:198)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

null:org.apache.solr.common.SolrException: Could not fully create
collection: FOOBAR
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:266)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:197)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:441)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)

EndOfStreamException: Unable to read additional data from client sessionid
0x154a47d2a290000, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)

java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169)


I looked at the internals with VisualVM and created a thread dump; for me
it seemed that Solr creates massive amount (multiple hundred) of
SearcherExecutor-threads, which are never stopped or disposed. As one the
cluster node went permanently down, I attempted to restart all nodes, after
which one node has failed to recover. For me it seemed like an issue in
Zookeeper registration: a vast number of threads were blocked
in  org.apache.solr.common.cloud.ZkStateReader$StateWatcher.process(ZkStateReader.java:834),
similarly to the following stack trace extract:

"zkCallback-4-thread-122-processing-n:127.0.1.1:7574_solr" #2750 prio=5
os_prio=0 tid=0x00007fe3a5160800 nid=0x5278 waiting for monitor entry
[0x00007fe4352fc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.solr.common.cloud.ZkStateReader$StateWatcher.process(ZkStateReader.java:834)
- waiting to lock <0x00000000a1f7b948> (a
org.apache.solr.common.cloud.ZkStateReader)
at
org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:266)
at
org.apache.solr.common.cloud.SolrZkClient$3$$Lambda$2/221500830.run(Unknown
Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$3/554496567.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Has anyone seen similar issues with Solr Cloud 6.0.0 or has any idea what
the root cause could be?


Thanks,
Peter

Mime
View raw message