Hi,
I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out exactly how much
throughput my cluster can handle.
Consistently in my test I see a replica go into recovering state forever caused by what looks
like a timeout during replication. I can understand the timeout and failure (I am hitting
it fairly hard) but what seems odd to me is that when I stop the heavy load it still does
not recover the next time it tries, it seems broken forever until I manually go in, clear
the index and let it do a full resync.
Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2 shards, 2 replicas)
(AWS m3.2xlarge). I am indexing with ~800 concurrent connections and a 10 sec soft commit.
I consistently get this problem with a throughput of around 1.5 million documents per hour.
Thanks all,
Darren
Stack Traces & Messages:
[qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â null:org.apache.http.conn.ConnectionPoolTimeoutException:
Timeout waiting for connection from pool
at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Error while trying to recover. core=assets_shard2_replica1:java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server
at: http://xxx.xxx.15.171:8080/solr
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking
to server at: http://xxx.xxx.15.171:8080/solr
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
at org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
at org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:452)
... 6 more
853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Recovery failed -
trying again... (0) core=assets_shard2_replica1
853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Recovery failed -
interrupted. core=assets_shard2_replica1
853915 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy â Recovery failed -
I give up. core=assets_shard2_replica1
853918 [RecoveryThread] WARN org.apache.solr.cloud.RecoveryStrategy â Stopping recovery
for zkNodeName=xxx.xxx.15.174:8080_solr_assets_shard2_replica1core=assets_shard2_replica1
853933 [Thread-382] WARN org.apache.solr.cloud.RecoveryStrategy â Stopping recovery for
zkNodeName=xxx.xxx.15.174:8080_solr_assets_shard2_replica1core=assets_shard2_replica1
|