ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Darlington <stephen.darling...@gridgain.com>
Subject Re: Ignite in Kubernetes not works correctly
Date Mon, 14 Jan 2019 11:21:41 GMT
Glad you managed to resolve it. What did you have to increase the values to?

Regards,
Stephen

> On 14 Jan 2019, at 09:34, Alena Laas <alena.laas@cbsinteractive.com> wrote:
> 
> It seems that increasing joinTimeout and failureDetectionTimeout solved the problem.
> 
> On Fri, Jan 11, 2019 at 5:24 PM Alena Laas <alena.laas@cbsinteractive.com <mailto:alena.laas@cbsinteractive.com>>
wrote:
> I attached part of the log with "node failed" events (100.99.129.141 - ip of restarted
node)
> 
> These events are repeated until suddenly after about 40 min - an hour node is connected
to cluster.
> 
> Could you explain why this is happening?
> 
> On Thu, Jan 10, 2019 at 7:54 PM Alena Laas <alena.laas@cbsinteractive.com <mailto:alena.laas@cbsinteractive.com>>
wrote:
> We are using Azure AKS cluster.
> 
> We kill pod using Kubernetes dashboard or through kubectl (kubectl delete pods <name>),
never mind, result is the same.
> 
> Maybe you need some more logs from us?
> 
> On Thu, Jan 10, 2019 at 7:28 PM Stephen Darlington <stephen.darlington@gridgain.com
<mailto:stephen.darlington@gridgain.com>> wrote:
> What kind of environment are you using? A public cloud? Your own data centre? And how
are you killing the pod?
> 
> I fired up a cluster using Minikube and your configuration and it worked as far as I
could see. (I deleted the pod using the dashboard, for what that’s worth.)
> 
> Regards,
> Stephen
> 
>> On 10 Jan 2019, at 14:20, Alena Laas <alena.laas@cbsinteractive.com <mailto:alena.laas@cbsinteractive.com>>
wrote:
>> 
>> 
>> 
>> ---------- Forwarded message ---------
>> From: Alena Laas <alena.laas@cbsinteractive.com <mailto:alena.laas@cbsinteractive.com>>
>> Date: Thu, Jan 10, 2019 at 5:13 PM
>> Subject: Ignite in Kubernetes not works correctly
>> To: <user@ignite.apache.org <mailto:user@ignite.apache.org>>
>> Cc: Vadim Shcherbakov <vadim.shcherbakov@cbsinteractive.com <mailto:vadim.shcherbakov@cbsinteractive.com>>
>> 
>> 
>> Hello!
>> Could you please help with some problem with Ignite within Kubernetes cluster?
>> 
>> When we start 2 Ignite nodes at the same time or use scaling for Deployment (from
1 to 2) everything is fine, both of them are visible inside Ignite cluster (we use web console
to see it)
>> 
>> But after we kill pod with one node and it restarts the node is no more seen in Ignite
cluster. Moreover the logs from this restarted node look poor:
>> [13:32:57]    __________  ________________ 
>> [13:32:57]   /  _/ ___/ |/ /  _/_  __/ __/ 
>> [13:32:57]  _/ // (7 7    // /  / / / _/   
>> [13:32:57] /___/\___/_/|_/___/ /_/ /___/  
>> [13:32:57] 
>> [13:32:57] ver. 2.7.0#20181130-sha1:256ae401
>> [13:32:57] 2018 Copyright(C) Apache Software Foundation
>> [13:32:57] 
>> [13:32:57] Ignite documentation: http://ignite.apache.org <http://ignite.apache.org/>
>> [13:32:57] 
>> [13:32:57] Quiet mode.
>> [13:32:57]   ^-- Logging to file '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
>> [13:32:57]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
>> [13:32:57]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v"
to ignite.{sh|bat}
>> [13:32:57] 
>> [13:32:57] OS: Linux 4.15.0-1036-azure amd64
>> [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13 Oracle Corporation
OpenJDK 64-Bit Server VM 25.181-b13
>> [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid
possible problems in mixed environments.
>> [13:32:57] Configured plugins:
>> [13:32:57]   ^-- None
>> [13:32:57] 
>> [13:32:57] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]]]
>> [13:32:58] Message queue limit is set to 0 which may lead to potential OOMEs when
running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth
on sender and receiver sides.
>> [13:32:58] Security status [authentication=off, tls/ssl=off]
>> 
>> And logs from the remaining node say that there are either 2 or 1 server and this
info is blinking
>> [14:02:05] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:02:20] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:02:35] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:02:50] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:03:00] Topology snapshot [ver=240, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:03:00] Topology snapshot [ver=241, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:03:06] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:03:16] Topology snapshot [ver=242, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:03:16] Topology snapshot [ver=243, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:03:21] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:03:31] Topology snapshot [ver=244, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:03:31] Topology snapshot [ver=245, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:03:36] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:03:46] Topology snapshot [ver=246, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:03:46] Topology snapshot [ver=247, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:03:51] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:04:01] Topology snapshot [ver=248, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE,
CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:04:01] Topology snapshot [ver=249, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE,
CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:04:06] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201]
>> 
>> I am attaching our config file for Ignite server and yaml files for Kubernetes. Everything
there was done according to your official documentation. Ignite version we are trying now
is 2.7.0
>> Looking forward to getting an answer from you.
>> 
>> -- 
>> ALENA LAAS
>> SOFTWARE ENGINEER (JAVA)
>> CNET Content Solutions
>> OFFICE +7.495.967.1201 FAX +7.495.967.1203    
>> 5 Letnikovskaya str., Moscow, Russia, 115114
>> 
>> 
>> 
>> -- 
>> ALENA LAAS
>> SOFTWARE ENGINEER (JAVA)
>> CNET Content Solutions
>> OFFICE +7.495.967.1201 FAX +7.495.967.1203    
>> 5 Letnikovskaya str., Moscow, Russia, 115114
>> 
>> <ignite-config-server.xml><fcat-ignite-stage.yaml>
> 
> 
> 
> 
> -- 
> ALENA LAAS
> SOFTWARE ENGINEER (JAVA)
> CNET Content Solutions
> OFFICE +7.495.967.1201 FAX +7.495.967.1203    
> 5 Letnikovskaya str., Moscow, Russia, 115114
> 
> 
> 
> -- 
> ALENA LAAS
> SOFTWARE ENGINEER (JAVA)
> CNET Content Solutions
> OFFICE +7.495.967.1201 FAX +7.495.967.1203    
> 5 Letnikovskaya str., Moscow, Russia, 115114
> 
> 
> 
> -- 
> ALENA LAAS
> SOFTWARE ENGINEER (JAVA)
> CNET Content Solutions
> OFFICE +7.495.967.1201 FAX +7.495.967.1203    
> 5 Letnikovskaya str., Moscow, Russia, 115114
> 



Mime
View raw message