lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Chen <Tim.C...@sbs.com.au>
Subject Solr Cloud with 5 servers cluster failed due to Leader out of memory
Date Fri, 05 Aug 2016 02:14:19 GMT
Hi Guys,

Me again. :)

We have 5 Solr servers:
01 -04 running Solr version 4.10 and ZooKeeper service
05 running ZooKeeper only.

JVM Max Memory set to 10G.

We have around 20 collections, and for each collection, there are 4 shards, for each shard,
there are 4 replica sitting across on 4 Solr servers.

Unfortunately most of time, all the Shards have the same Leader (eg, Solr server 01).

Now, If we are adding a lot of documents to Solr, and eventually Solr 01 (All Shard's Leader)
throws Out of memory in Tomcat log, and service goes down (but 8983 port is still responding
to telnet).
At this moment, I went to see logs on Solr02, Solr03, Solr04, and there are a lot of "Connection
time out", in another 2 minutes, all these three Solr servers' service goes down too!

My feeling is that, when there are a lot of documents pushing in, Leader will be busy with
indexing, and also requesting other (non-leader) servers to do the index as well. All other
non-leader server are relying on Leader to finish the new document index. At a certain point,
that Solr01 (Leader) server has no more memory, it gives up, but other (non-leader) servers
are still waiting for Leader to respond. The whole Solr Cloud cluster breaks from here....
 No more requests being served.

Couple of thoughts:
1, If Leader goes down, it should just go down, like dead down, so other servers can do the
election and choose the new leader. This at least avoids bringing down the whole cluster.
Am I right?
2, Apparently we should not pushing too many documents to Solr, how do you guys handle this?
Set a limit somewhere?

Thanks,
Tim




[Premiere League Starts Saturday 13 August 9.30pm on SBS]<http://theworldgame.sbs.com.au/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message