Glad you managed to resolve it. What did you have to increase the values to?

Regards,
Stephen

On 14 Jan 2019, at 09:34, Alena Laas <alena.laas@cbsinteractive.com> wrote:

It seems that increasing joinTimeout and failureDetectionTimeout solved the problem.

On Fri, Jan 11, 2019 at 5:24 PM Alena Laas <alena.laas@cbsinteractive.com> wrote:
I attached part of the log with "node failed" events (100.99.129.141 - ip of restarted node)

These events are repeated until suddenly after about 40 min - an hour node is connected to cluster.

Could you explain why this is happening?

On Thu, Jan 10, 2019 at 7:54 PM Alena Laas <alena.laas@cbsinteractive.com> wrote:
We are using Azure AKS cluster.

We kill pod using Kubernetes dashboard or through kubectl (kubectl delete pods <name>), never mind, result is the same.

Maybe you need some more logs from us?

On Thu, Jan 10, 2019 at 7:28 PM Stephen Darlington <stephen.darlington@gridgain.com> wrote:
What kind of environment are you using? A public cloud? Your own data centre? And how are you killing the pod?

I fired up a cluster using Minikube and your configuration and it worked as far as I could see. (I deleted the pod using the dashboard, for what that’s worth.)

Regards,
Stephen

On 10 Jan 2019, at 14:20, Alena Laas <alena.laas@cbsinteractive.com> wrote:



---------- Forwarded message ---------
From: Alena Laas <alena.laas@cbsinteractive.com>
Date: Thu, Jan 10, 2019 at 5:13 PM
Subject: Ignite in Kubernetes not works correctly
To: <user@ignite.apache.org>
Cc: Vadim Shcherbakov <vadim.shcherbakov@cbsinteractive.com>


Hello!
Could you please help with some problem with Ignite within Kubernetes cluster?

When we start 2 Ignite nodes at the same time or use scaling for Deployment (from 1 to 2) everything is fine, both of them are visible inside Ignite cluster (we use web console to see it)

But after we kill pod with one node and it restarts the node is no more seen in Ignite cluster. Moreover the logs from this restarted node look poor:
[13:32:57] __________ ________________
[13:32:57] / _/ ___/ |/ / _/_ __/ __/
[13:32:57] _/ // (7 7 // / / / / _/
[13:32:57] /___/\___/_/|_/___/ /_/ /___/
[13:32:57]
[13:32:57] ver. 2.7.0#20181130-sha1:256ae401
[13:32:57] 2018 Copyright(C) Apache Software Foundation
[13:32:57]
[13:32:57] Ignite documentation: http://ignite.apache.org
[13:32:57]
[13:32:57] Quiet mode.
[13:32:57] ^-- Logging to file '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
[13:32:57] ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[13:32:57] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
[13:32:57]
[13:32:57] OS: Linux 4.15.0-1036-azure amd64
[13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13 Oracle Corporation OpenJDK 64-Bit Server VM 25.181-b13
[13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
[13:32:57] Configured plugins:
[13:32:57] ^-- None
[13:32:57]
[13:32:57] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]]]
[13:32:58] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[13:32:58] Security status [authentication=off, tls/ssl=off]

And logs from the remaining node say that there are either 2 or 1 server and this info is blinking
[14:02:05] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:02:20] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:02:35] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:02:50] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:03:00] Topology snapshot [ver=240, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:03:00] Topology snapshot [ver=241, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:03:06] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:03:16] Topology snapshot [ver=242, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:03:16] Topology snapshot [ver=243, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:03:21] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:03:31] Topology snapshot [ver=244, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:03:31] Topology snapshot [ver=245, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:03:36] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:03:46] Topology snapshot [ver=246, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:03:46] Topology snapshot [ver=247, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:03:51] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
[14:04:01] Topology snapshot [ver=248, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
[14:04:01] Topology snapshot [ver=249, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
[14:04:06] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]

I am attaching our config file for Ignite server and yaml files for Kubernetes. Everything there was done according to your official documentation. Ignite version we are trying now is 2.7.0
Looking forward to getting an answer from you.

-- 
ALENA LAAS
SOFTWARE ENGINEER (JAVA)
CNET Content Solutions
OFFICE +7.495.967.1201 FAX +7.495.967.1203    
5 Letnikovskaya str., Moscow, Russia, 115114
CNET Content Solutions


-- 
ALENA LAAS
SOFTWARE ENGINEER (JAVA)
CNET Content Solutions
OFFICE +7.495.967.1201 FAX +7.495.967.1203    
5 Letnikovskaya str., Moscow, Russia, 115114
CNET Content Solutions
<ignite-config-server.xml><fcat-ignite-stage.yaml>




-- 
ALENA LAAS
SOFTWARE ENGINEER (JAVA)
CNET Content Solutions
OFFICE +7.495.967.1201 FAX +7.495.967.1203    
5 Letnikovskaya str., Moscow, Russia, 115114
CNET Content Solutions


-- 
ALENA LAAS
SOFTWARE ENGINEER (JAVA)
CNET Content Solutions
OFFICE +7.495.967.1201 FAX +7.495.967.1203    
5 Letnikovskaya str., Moscow, Russia, 115114
CNET Content Solutions


-- 
ALENA LAAS
SOFTWARE ENGINEER (JAVA)
CNET Content Solutions
OFFICE +7.495.967.1201 FAX +7.495.967.1203    
5 Letnikovskaya str., Moscow, Russia, 115114
CNET Content Solutions