ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alena Laas <alena.l...@cbsinteractive.com>
Subject Re: Ignite in Kubernetes not works correctly
Date Mon, 14 Jan 2019 12:01:33 GMT
failureDetectionTimeout - 60000
joinTimeout - 120000
Saw these recomendations in one of the answers in your forum

On Mon, Jan 14, 2019 at 2:21 PM Stephen Darlington <
stephen.darlington@gridgain.com> wrote:

> Glad you managed to resolve it. What did you have to increase the values
> to?
>
> Regards,
> Stephen
>
> On 14 Jan 2019, at 09:34, Alena Laas <alena.laas@cbsinteractive.com>
> wrote:
>
> It seems that increasing joinTimeout and failureDetectionTimeout solved
> the problem.
>
> On Fri, Jan 11, 2019 at 5:24 PM Alena Laas <alena.laas@cbsinteractive.com>
> wrote:
>
>> I attached part of the log with "node failed" events (100.99.129.141 - ip
>> of restarted node)
>>
>> These events are repeated until suddenly after about 40 min - an hour
>> node is connected to cluster.
>>
>> Could you explain why this is happening?
>>
>> On Thu, Jan 10, 2019 at 7:54 PM Alena Laas <alena.laas@cbsinteractive.com>
>> wrote:
>>
>>> We are using Azure AKS cluster.
>>>
>>> We kill pod using Kubernetes dashboard or through kubectl (kubectl
>>> delete pods <name>), never mind, result is the same.
>>>
>>> Maybe you need some more logs from us?
>>>
>>> On Thu, Jan 10, 2019 at 7:28 PM Stephen Darlington <
>>> stephen.darlington@gridgain.com> wrote:
>>>
>>>> What kind of environment are you using? A public cloud? Your own data
>>>> centre? And how are you killing the pod?
>>>>
>>>> I fired up a cluster using Minikube and your configuration and it
>>>> worked as far as I could see. (I deleted the pod using the dashboard, for
>>>> what that’s worth.)
>>>>
>>>> Regards,
>>>> Stephen
>>>>
>>>> On 10 Jan 2019, at 14:20, Alena Laas <alena.laas@cbsinteractive.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> ---------- Forwarded message ---------
>>>> From: Alena Laas <alena.laas@cbsinteractive.com>
>>>> Date: Thu, Jan 10, 2019 at 5:13 PM
>>>> Subject: Ignite in Kubernetes not works correctly
>>>> To: <user@ignite.apache.org>
>>>> Cc: Vadim Shcherbakov <vadim.shcherbakov@cbsinteractive.com>
>>>>
>>>>
>>>> Hello!
>>>> Could you please help with some problem with Ignite within Kubernetes
>>>> cluster?
>>>>
>>>> When we start 2 Ignite nodes at the same time or use scaling for
>>>> Deployment (from 1 to 2) everything is fine, both of them are visible
>>>> inside Ignite cluster (we use web console to see it)
>>>>
>>>> But after we kill pod with one node and it restarts the node is no more
>>>> seen in Ignite cluster. Moreover the logs from this restarted node look
>>>> poor:
>>>> [13:32:57] __________ ________________
>>>> [13:32:57] / _/ ___/ |/ / _/_ __/ __/
>>>> [13:32:57] _/ // (7 7 // / / / / _/
>>>> [13:32:57] /___/\___/_/|_/___/ /_/ /___/
>>>> [13:32:57]
>>>> [13:32:57] ver. 2.7.0#20181130-sha1:256ae401
>>>> [13:32:57] 2018 Copyright(C) Apache Software Foundation
>>>> [13:32:57]
>>>> [13:32:57] Ignite documentation: http://ignite.apache.org
>>>> [13:32:57]
>>>> [13:32:57] Quiet mode.
>>>> [13:32:57] ^-- Logging to file
>>>> '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
>>>> [13:32:57] ^-- Logging by 'JavaLogger [quiet=true, config=null]'
>>>> [13:32:57] ^-- To see **FULL** console log here add
>>>> -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
>>>> [13:32:57]
>>>> [13:32:57] OS: Linux 4.15.0-1036-azure amd64
>>>> [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13
>>>> Oracle Corporation OpenJDK 64-Bit Server VM 25.181-b13
>>>> [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true'
>>>> to avoid possible problems in mixed environments.
>>>> [13:32:57] Configured plugins:
>>>> [13:32:57] ^-- None
>>>> [13:32:57]
>>>> [13:32:57] Configured failure handler:
>>>> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
>>>> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]]]
>>>> [13:32:58] Message queue limit is set to 0 which may lead to potential
>>>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due
>>>> to message queues growth on sender and receiver sides.
>>>> [13:32:58] Security status [authentication=off, tls/ssl=off]
>>>>
>>>> And logs from the remaining node say that there are either 2 or 1
>>>> server and this info is blinking
>>>> [14:02:05] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:02:20] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:02:35] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:02:50] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:03:00] Topology snapshot [ver=240, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:03:00] Topology snapshot [ver=241, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:03:06] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:03:16] Topology snapshot [ver=242, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:03:16] Topology snapshot [ver=243, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:03:21] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:03:31] Topology snapshot [ver=244, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:03:31] Topology snapshot [ver=245, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:03:36] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:03:46] Topology snapshot [ver=246, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:03:46] Topology snapshot [ver=247, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:03:51] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>> [14:04:01] Topology snapshot [ver=248, locNode=a5eb30e1, servers=2,
>>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>>> [14:04:01] Topology snapshot [ver=249, locNode=a5eb30e1, servers=1,
>>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>>> [14:04:06] Joining node doesn't have encryption data
>>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>>>
>>>> I am attaching our config file for Ignite server and yaml files for
>>>> Kubernetes. Everything there was done according to your official
>>>> documentation. Ignite version we are trying now is 2.7.0
>>>> Looking forward to getting an answer from you.
>>>>
>>>> --
>>>>
>>>> *ALENA LAAS*SOFTWARE ENGINEER (JAVA)
>>>> CNET Content Solutions
>>>> OFFICE +7.495.967.1201 FAX +7.495.967.1203
>>>> 5 Letnikovskaya str., Moscow, Russia, 115114
>>>> [image: CNET Content Solutions]
>>>>
>>>>
>>>> --
>>>>
>>>> *ALENA LAAS*SOFTWARE ENGINEER (JAVA)
>>>> CNET Content Solutions
>>>> OFFICE +7.495.967.1201 FAX +7.495.967.1203
>>>> 5 Letnikovskaya str., Moscow, Russia, 115114
>>>> [image: CNET Content Solutions]
>>>> <ignite-config-server.xml><fcat-ignite-stage.yaml>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> *ALENA LAAS*SOFTWARE ENGINEER (JAVA)
>>> CNET Content Solutions
>>> OFFICE +7.495.967.1201 FAX +7.495.967.1203
>>> 5 Letnikovskaya str., Moscow, Russia, 115114
>>> [image: CNET Content Solutions]
>>>
>>
>>
>> --
>>
>> *ALENA LAAS*SOFTWARE ENGINEER (JAVA)
>> CNET Content Solutions
>> OFFICE +7.495.967.1201 FAX +7.495.967.1203
>> 5 Letnikovskaya str., Moscow, Russia, 115114
>> [image: CNET Content Solutions]
>>
>
>
> --
>
> *ALENA LAAS*SOFTWARE ENGINEER (JAVA)
> CNET Content Solutions
> OFFICE +7.495.967.1201 FAX +7.495.967.1203
> 5 Letnikovskaya str., Moscow, Russia, 115114
> [image: CNET Content Solutions]
>
>
>
>

-- 

*ALENA LAAS*SOFTWARE ENGINEER (JAVA)
CNET Content Solutions
OFFICE +7.495.967.1201 FAX +7.495.967.1203
5 Letnikovskaya str., Moscow, Russia, 115114
[image: CNET Content Solutions]

Mime
View raw message