lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11881) Connection Reset Causing LIR
Date Tue, 08 May 2018 04:26:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466841#comment-16466841
] 

Mark Miller commented on SOLR-11881:
------------------------------------

Yeah, I'd be hesitant to add retries to DBQ without input from [~yseeley@gmail.com].

> Connection Reset Causing LIR
> ----------------------------
>
>                 Key: SOLR-11881
>                 URL: https://issues.apache.org/jira/browse/SOLR-11881
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-11881-SolrCmdDistributor.patch, SOLR-11881.patch, SOLR-11881.patch
>
>
> We can see that a connection reset is causing LIR.
> If a leader -> replica update get's a connection like this the leader will initiate
LIR
> {code:java}
> 2018-01-08 17:39:16.980 ERROR (qtp1010030701-468988) [c:collection s:shardX r:core_node56
collection_shardX_replicaY] o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start
recovery on replica https://host08.domain:8985/solr/collection_shardX_replicaY/
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:210)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
>         at sun.security.ssl.InputRecord.read(InputRecord.java:503)
>         at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
>         at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
>         at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
>         at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
>         at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
>         at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
>         at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
>         at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
>         at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
>         at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
>         at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
>         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>         at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:312)
>         at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
>         at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> From https://issues.apache.org/jira/browse/SOLR-6931 Mark says "On a heavy working SolrCloud
cluster, even a rare response like this from a replica can cause a recovery and heavy cluster
disruption" .
> Looking at SOLR-6931 we added a http retry handler but we only retry on GET requests.
Updates are POST requests {{ConcurrentUpdateSolrClient#sendUpdateStream}}
> Update requests between the leader and replica should be retry-able since they have been
versioned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message