lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: ZK session times out intermittently
Date Tue, 20 Feb 2018 08:18:43 GMT
On 2/19/2018 3:33 PM, Roy Lim wrote:
> 6 x Solr (3 primary shard, 3 secondary)
> 3 x ZK
> The client is indexing over 16 million documents using 8 threads.  Auto-soft
> commit is 3 minutes, auto-commit is 10 minutes.

I would probably reduce the autoCommit time to 1 minute, as long as 
openSearcher is set to false, which is the recommended setting.  This is 
not necessary, but it would probably reduce the size of your transaction 
logs, which will make Solr restarts faster.

> The following timeout is observed in our client log, intermittently:

There is no information here.  I checked Nabble as well, because 
sometimes when they replicate to the mailing list, there is information 
on their forum that does not show up on the mailing list.  In this case, 
Nabble didn't have any information either. If you can't get the data to 
stay in the message, you may need to use a paste website and provide a URL.

> Thinking that this is a case where ZK could no longer establish connection
> to Solr node it is communicating with, I went to the primary nodes and
> correlated the timestamps.  They all are very similar to below:

Again, there is nothing here for us to examine.

BTW, ZK does not connect to Solr.  Solr connects to ZK.It's possible 
that you're already aware of this, but because of the way you phrased 
your comment, I cannot tell for sure.

> Note the time gap of over 1 minute, which I can only surmise that ZK is
> waiting this whole time for Solr to return, only to timeout.  Is that
> reasonable?  Thing is I have no idea what is happening in during that time
> and why Solr is taking so long.  Note the second statement signaling the
> start of the soft commit, so I don't think this is a case of a long commit.
> Finally, checking the GC logs, there are no long pauses either!
> Hoping an expert can shed some light here.

Because we can't actually see the information you've referenced, which I 
assume are excerpts from logfiles, it's difficult to make any kind of 
recommendation, or even make a guess.

We'll need to see your solr logfile, and maybe your ZK logfile. 
Hopefully there are ERROR logs that we can attempt to decipher, but 
you'll want the logging to be at the default level of INFO, so we can 
see the errors in context.  If Solr and ZK are on separate servers, 
you'll want to make sure that there is good time synchronization, so 
that timestamps in different logs are in sync with each other.

How have you determined that the GC log does not have long pauses?  Can 
you share a GC log that includes the timeframe where the problem happened?


View raw message