lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From philippa griggs <>
Subject Solr 5.2.1 Most solr nodes in a cluster going down at once.
Date Mon, 07 Dec 2015 16:37:59 GMT

I'm using:

Solr 5.2.1 10 shards each with a replica. (20 nodes in total)

Zookeeper 3.4.6.

About half a year ago we upgraded to Solr 5.2.1 and since then have been experiencing a 'wipe
out' effect where all of a sudden most if not all nodes will go down. Sometimes they will
recover by themselves but more often than not we have to step in to restart nodes.

Nothing in the logs jumps out as being the problem. With the latest wipe out we noticed that
10 out of the 20 nodes had garbage collections over 1min all at the same time, with the heap
usage spiking up in some cases to 80%. We also noticed the amount of selects run on the solr
cluster increased just before the wipe out.

Increasing the heap size seems to help for a while but then it starts happening again- so
its more like a delay than a fix. Our GC settings are set to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.

With our previous version of solr (4.10.0) this didn't happen. We had nodes/shards go down
but it was contained, with the new version they all seem to go at around the same time. We
can't really continue just increasing the heap size and would like to solve this issue rather
than delay it.

Has anyone experienced something simular?

Is there a difference between the two versions around the recovery process?

Does anyone have any suggestions on a fix.

Many thanks


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message