lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Horvath (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-9129) Solr Cloud hangs when creating large number of collections and node fails to recover after restart
Date Wed, 18 May 2016 14:20:12 GMT
Peter Horvath created SOLR-9129:
-----------------------------------

             Summary: Solr Cloud hangs when creating large number of collections and node
fails to recover after restart
                 Key: SOLR-9129
                 URL: https://issues.apache.org/jira/browse/SOLR-9129
             Project: Solr
          Issue Type: Bug
          Components: Server
    Affects Versions: 6.0
         Environment: OS: GNU Linux, kernel 4.4.0-22 on x86_64 (Ubuntu Linux 16.04 LTS (64-bit))
RAM: 16 GB
CPU: Intel Core i7-4720HQ CPU @ 2.60GHz × 8
Java version: Oracle JDK 1.8.0_92 (x64) build 1.8.0_92-b14 Java HotSpot(TM) 64-Bit Server
VM (build 25.92-b14, mixed mode)
            Reporter: Peter Horvath


I attempted to benchmark SolrCloud to see how well it would work with some sample data set
of ours. 
I wanted to create about 2500 empty collections first to see how that would scale.

Unfortunately, the test was not successful. Solr started failing after creating around 2000
collections and the cluster has failed to recover after a complete restart, which is quite
concerning to me. 

I based my environment on the cloud example (I use the same config set as the gettingstarted
example collection etc); so I have the vanilla install and used the following commands to
bring the nodes online:

.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8983 -s
".../solr/6.0.0/example/cloud/node1/solr"
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7574 -s
".../solr/6.0.0/example/cloud/node2/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 8984 -s
".../solr/6.0.0/example/cloud/node3/solr" -z localhost:9983
.../solr/6.0.0/bin/solr start -m 2g -cloud -p 7575 -s
".../solr/6.0.0/example/cloud/node4/solr" -z localhost:9983

After about 2000 collections were created, SolR got hung; REST requests started failing. I
found the following entry in the logs, wihch I could relate to the failed REST request. For
further logs, please see the attachment of this issue. 

null:org.apache.solr.common.SolrException: Could not fully create collection: FOOBAR
	at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:266)
	at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:197)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
	at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:658)
	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:441)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.eclipse.jetty.server.Server.handle(Server.java:518)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
	at java.lang.Thread.run(Thread.java:745)

For further logs, please see the attachment of this issue. 

After the Solr instance affected has failed to recover, I decided to restart the whole cluster
(using the official solr stop-start commands). Unfortunately, after this, at least one node
remained spinning in ZooKeeper logic, creating more than four thousand (!!) threads.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message