ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Adams <sbad...@gmail.com>
Subject Client Node Leaves Cluster
Date Tue, 27 Oct 2015 10:26:02 GMT

I'm running a compute job that will take many days. After a few hours of
running it the client left the cluster.

I see a lot of errors like:

Failed to send local partition map to node [node=TcpDiscoveryNode
[id=6eff7f64-f0a0-455d-8e50-5d2e18ac56f8, addrs=[0:0:0:0:0:0:0:1,,], sockAddrs=[D0065-gtp-corp/,
/0:0:0:0:0:0:0:1:47500, /, /],
discPort=47500, order=5, intOrder=5, lastExchangeTime=1445859635058,
loc=false, ver=1.4.0#20150924-sha1:c2def5f6, isClient=false], exchId=null]

After a while the the client seems to leave the cluster and I see:

[07:07:09] Topology snapshot [ver=98, servers=1, clients=0, CPUs=4,

The cluster is still up however and if I start up other nodes they join the
cluster, it's just the client that seems to have left in this instance.

On the machine I see GC overhead limit exceeded errors. Could
this be causing the issue? Why is the node not removed from the cluster if
it cannot be reached?



View raw message