lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Horvath (JIRA)" <>
Subject [jira] [Updated] (SOLR-9129) Solr Cloud hangs when creating large number of collections and node fails to recover after restart
Date Wed, 18 May 2016 14:22:12 GMT


Peter Horvath updated SOLR-9129:
    Attachment: exception2.txt

EndOfStreamException: Unable to read additional data from client sessionid 0x154a47d2a290000,
likely client has closed socket

> Solr Cloud hangs when creating large number of collections and node fails to recover
after restart
> --------------------------------------------------------------------------------------------------
>                 Key: SOLR-9129
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 6.0
>         Environment: OS: GNU Linux, kernel 4.4.0-22 on x86_64 (Ubuntu Linux 16.04 LTS
> RAM: 16 GB
> CPU: Intel Core i7-4720HQ CPU @ 2.60GHz × 8
> Java version: Oracle JDK 1.8.0_92 (x64) build 1.8.0_92-b14 Java HotSpot(TM) 64-Bit Server
VM (build 25.92-b14, mixed mode)
>            Reporter: Peter Horvath
>         Attachments: exception1.txt, exception2.txt, exception3.txt
> I attempted to benchmark SolrCloud to see how well it would work with some sample data
set of ours. 
> I wanted to create about 2500 empty collections first to see how that would scale.
> Unfortunately, the test was not successful. Solr started failing after creating around
2000 collections and the cluster has failed to recover after a complete restart, which is
quite concerning to me. 
> I based my environment on the cloud example (I use the same config set as the gettingstarted
example collection etc); so I have the vanilla install and used the following commands to
bring the nodes online:
> .../solr/6.0.0/bin/solr start -m 2g -cloud -p 8983 -s
> ".../solr/6.0.0/example/cloud/node1/solr"
> .../solr/6.0.0/bin/solr start -m 2g -cloud -p 7574 -s
> ".../solr/6.0.0/example/cloud/node2/solr" -z localhost:9983
> .../solr/6.0.0/bin/solr start -m 2g -cloud -p 8984 -s
> ".../solr/6.0.0/example/cloud/node3/solr" -z localhost:9983
> .../solr/6.0.0/bin/solr start -m 2g -cloud -p 7575 -s
> ".../solr/6.0.0/example/cloud/node4/solr" -z localhost:9983
> After about 2000 collections were created, SolR got hung; REST requests started failing.
I found the following entry in the logs, wihch I could relate to the failed REST request.
For further logs, please see the attachment of this issue. 
> null:org.apache.solr.common.SolrException: Could not fully create collection: FOOBAR
> 	at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(
> 	at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> 	at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(
> 	at
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> 	at
> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(
> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(
> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(
> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> 	at org.eclipse.jetty.server.Server.handle(
> 	at org.eclipse.jetty.server.HttpChannel.handle(
> 	at org.eclipse.jetty.server.HttpConnection.onFillable(
> 	at$ReadCallback.succeeded(
> 	at
> 	at$
> 	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(
> 	at
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$
> 	at
> For further logs, please see the attachment of this issue. 
> After the Solr instance affected has failed to recover, I decided to restart the whole
cluster (using the official solr stop-start commands). Unfortunately, after this, at least
one node remained spinning in ZooKeeper logic, creating more than four thousand (!!) threads.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message