lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <sg.online.em...@gmail.com>
Subject Re: 7.2.1 cluster dies within minutes after restart
Date Mon, 29 Jan 2018 19:15:32 GMT
Hi Markus,

We are in the process of upgrading our clusters to 7.2.1 and I am not sure
I quite follow the conversation here.
Is there a simple workaround to set the ZK_CLIENT_TIMEOUT to a higher value
in the config (and it's just a default value being wrong/overridden
somewhere)?
Or is it more severe in the sense that any config set for ZK_CLIENT_TIMEOUT
by the user is just ignored completely by Solr in 7.2.1 ?

Thanks
SG


On Mon, Jan 29, 2018 at 3:09 AM, Markus Jelsma <markus.jelsma@openindex.io>
wrote:

> Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml
> says 30000 if ZK_CLIENT_TIMEOUT is not set, which is by default unset in
> solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is
> still 15000, not 30000.
>
> But, back to my topic. I see we explicitly set it in solr.in.sh to 30000.
> To be sure, i applied your patch to a production machine, all our
> collections run with 30000. So how would that explain this log line?
>
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 22130ms
>
> We also see these with smaller values, seven seconds. And, is this
> actually an indicator of the problems we have?
>
> Any ideas?
>
> Many thanks,
> Markus
>
>
> -----Original message-----
> > From:Markus Jelsma <markus.jelsma@openindex.io>
> > Sent: Saturday 27th January 2018 10:03
> > To: solr-user@lucene.apache.org
> > Subject: RE: 7.2.1 cluster dies within minutes after restart
> >
> > Hello,
> >
> > I grepped for it yesterday and found nothing but 30000 in the settings,
> but judging from the weird time out value, you may be right. Let me apply
> your patch early next week and check for spurious warnings.
> >
> > Another note worthy observation for those working on cloud stability and
> recovery, whenever this happens, some nodes are also absolutely sure to run
> OOM. The leaders usually live longest, the replica's don't, their heap
> usage peaks every time, consistently.
> >
> > Thanks,
> > Markus
> >
> > -----Original message-----
> > > From:Shawn Heisey <apache@elyograg.org>
> > > Sent: Saturday 27th January 2018 0:49
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > >
> > > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > > o.a.z.ClientCnxn Client session timed out, have not heard from
> server in 22130ms (although zkClientTimeOut is 30000).
> > >
> > > Are you absolutely certain that there is a setting for zkClientTimeout
> > > that is actually getting applied?  The default value in Solr's example
> > > configs is 30 seconds, but the internal default in the code (when no
> > > configuration is found) is still 15.  I have confirmed this in the
> code.
> > >
> > > Looks like SolrCloud doesn't log the values it's using for things like
> > > zkClientTimeout.  I think it should.
> > >
> > > https://issues.apache.org/jira/browse/SOLR-11915
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message