lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Socket \ Connection Timeout Values
Date Thu, 03 Sep 2015 06:35:34 GMT
On 9/3/2015 12:06 AM, Arnon Yogev wrote:
> I wanted to ask about the implications of different timeout values one can 
> use. 
> 
> For example:
> From what I see in the code, the default socket timeout value for Solr is 
> 0.
> Does that mean Solr nodes will wait to update \ receive update from each 
> other without any timeout?

The socket timeout is a property of the TCP connection, which is
ultimately handled by the operating system.  Solr uses HTTP, which is a
TCP-based protocol.  This is not specific to Solr.

A value of zero means the operating system won't time out and disconnect
the TCP session.  Generally you want your servers to have no socket
timeout, and depending on exactly what you are doing, *maybe* you will
configure a socket timeout on the client side.  For zookeeper, there is
no need to have a socket timeout, as you will see when I continue below.

> In other words, can the following scenario happen:
> 1. One solr node becomes very slow for some reason, but is still 
> considered alive in ZK.
> 2. Other servers in the cluster try to update \ receive updates from this 
> node, but do not get responds.
> 3. Since there's no timeout defined, all nodes in the cluster will 
> eventually become unresponsive (when the thread pool is exhausted).

Even though the socket timeout is generally zero so the OS won't
terminate idle TCP connections, the application can take care of
timeouts and terminations.

Solr configures a zkClientTimeout. If I remember my last dive into
SolrCloud code correctly, this is transferred pretty much straight
across to the zookeeper client as its session timeout.  If this timeout
is exceeded on pretty much any inter-server communication, SolrCloud
will generally mark the node down.

Historically there have been a lot of problems with SolrCloud nodes
being marked down due to garbage collection pauses that exceed the
timeout.  Since 5.0 this should be less of a problem, because the
included start scripts have aggressive GC tuning.

The zkClientTimeout defaults to 15 seconds internally inside Solr if you
do not have any configuration that sets the value, but most recent Solr
example configurations set it to 30 seconds.  In most situations, a 15
second timeout is VERY long ... if that's being exceeded, there is
usually a serious problem that needs fixing.

Thanks,
Shawn


Mime
View raw message