ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@gridgain.com>
Subject Re: Help with tuning for larger clusters
Date Mon, 02 Nov 2015 13:14:47 GMT

Thanks for the clarifications. Now we're on the same page.

It's great that the cluster is initially assembled without any issue and you
see that all 64 joined the topology.

In regards to 'rebalancing timeout' warnings I have the following thoughts.

First, I've opened a bug that describes your and similar cases that happens
on big cluster with rebalancing. You may want to track it:

Second, I'm not sure that this bug is 100% your case and doesn't guarantee
that the issue on your side disappears when it gets fixed. That's why lets
check the following.

1) As far as I remember before we decreased the port range used by discovery
it took significant time for you to form the cluster of 64 nodes. What are
the settings of your network (throughput, 10GB or 1GB)? How do you use this
servers? Are they already under the load by some other apps that decrease
network throughput? I think you should find out whether everything is OK in
this area or not. IMHO at least the situation is not ideal.

2) Please increate TcpCommunicationSpi.socketWriteTimeout to 15 secs (the
same value that failureDetectionTimeout has).
Actually you may want to try configuring network related parameters directly
instead of relying on failureDetectionTimeout:
- TcpCommunicationSpi.socketWriteTimeout
- TcpCommunicationSpi.connectTimeout
- TcpDiscoverySpi.socketTimeout 
- TcpDiscoverySpi.ackTimeout 

3) In some logs I see that IGFS endpoint failed to start. Please check who
occupies that port number.
[07:33:41,736][WARN ][main][IgfsServerManager] Failed to start IGFS endpoint
(will retry every 3s). Failed to bind to port (is port already in use?):

4) Please turn off IGFS/HDFS/Hadoop at all and start the cluster. Let's
check how long it will live in the idle state. But please take into account
1) before.


View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Help-with-tuning-for-larger-clusters-tp1692p1814.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

View raw message