lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11881) Connection Reset Causing LIR
Date Sat, 05 May 2018 01:54:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464563#comment-16464563
] 

Mark Miller commented on SOLR-11881:
------------------------------------

bq. if we don't set the timeout on the HttpPost request by setting a request config

Cool - the tricky part is, if only one of the properties of the two is overridden on the client
itself on the fly, we still want to pick up the default for the other one.

> Connection Reset Causing LIR
> ----------------------------
>
>                 Key: SOLR-11881
>                 URL: https://issues.apache.org/jira/browse/SOLR-11881
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-11881-SolrCmdDistributor.patch, SOLR-11881.patch, SOLR-11881.patch
>
>
> We can see that a connection reset is causing LIR.
> If a leader -> replica update get's a connection like this the leader will initiate
LIR
> {code:java}
> 2018-01-08 17:39:16.980 ERROR (qtp1010030701-468988) [c:collection s:shardX r:core_node56
collection_shardX_replicaY] o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start
recovery on replica https://host08.domain:8985/solr/collection_shardX_replicaY/
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:210)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
>         at sun.security.ssl.InputRecord.read(InputRecord.java:503)
>         at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
>         at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
>         at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
>         at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
>         at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
>         at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
>         at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
>         at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
>         at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
>         at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
>         at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
>         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>         at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
>         at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:312)
>         at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
>         at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> From https://issues.apache.org/jira/browse/SOLR-6931 Mark says "On a heavy working SolrCloud
cluster, even a rare response like this from a replica can cause a recovery and heavy cluster
disruption" .
> Looking at SOLR-6931 we added a http retry handler but we only retry on GET requests.
Updates are POST requests {{ConcurrentUpdateSolrClient#sendUpdateStream}}
> Update requests between the leader and replica should be retry-able since they have been
versioned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message