ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akash Shinde <akashshi...@gmail.com>
Subject Re: Server Nodes Stopped Unexpectedly
Date Mon, 09 Sep 2019 13:34:04 GMT
Hi,
Sorry for late reply. I was out of town.
I am trying fetch the logs. Meanwhile could you please answer the questions
from last mail ?

Thanks,
Akash

On Thu, Aug 29, 2019 at 6:51 PM Evgenii Zhuravlev <e.zhuravlev.wk@gmail.com>
wrote:

> Hi,
> Can you please share new logs? It will help to understand the possible
> reason of the issue.
>
> Thanks,
> Evgenii
>
> ср, 28 авг. 2019 г. в 17:56, Akash Shinde <akashshinde@gmail.com>:
>
>> Hi,
>>
>> Now I have set the failure detection timeout to 120000 mills and I am
>> still getting this error message intermittently on Ignite 2.6 version.
>> It could be the network issue but I am not able to confirm that this is
>> happening because of network issue.
>>
>> 1)  What are all possible reasons for following error? Could you please
>> mention it, it might help to narrow down the issue.
>>  [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]
>>
>> 2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this
>> problem?
>>
>> 3) How do you monitor the network. Can you please suggest any tool?
>>
>> 4) I understand that node gets segmented because of long GC pause or
>> network connectivity. Is my understanding correct?
>>
>> 5) What is the purpose of networkTimeout configuration? In my case it is
>> set to 10000 .
>>
>> Regards,
>> Akash
>>
>> On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev <
>> e.zhuravlev.wk@gmail.com> wrote:
>>
>>> >Does network issue make JVM  halt?
>>> There is a failureDetectionTimeout, which will help other nodes in the
>>> cluster to detect that node is unreachable and to exclude this node from
>>> topology. So, I believe it could be something like a temporary network
>>> problem. I would recommend to add some network monitoring to be prepared
>>> for the next failure.
>>>
>>> Best Regards,
>>> Evgenii
>>>
>>> пт, 26 июл. 2019 г. в 16:01, Akash Shinde <akashshinde@gmail.com>:
>>>
>>>> This issue is not consistent and but occurs sometimes. Does network
>>>> issue make JVM  halt?. As per my understanding node will disconnects from
>>>> cluster if network issue happens.
>>>> But in this case multiple JVMs were terminated.Can it be a bug in
>>>> Ignite 2.6 version?
>>>>
>>>> Thanks,
>>>> Akash
>>>>
>>>> On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev <
>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>
>>>>> I don't see any specific errors in the logs. For me, it looks like
>>>>> network problems, moreover, on client nodes it prints messages about
>>>>> connection problems. Is this issue reproducible?
>>>>> Evgenii
>>>>>
>>>>> пт, 26 июл. 2019 г. в 09:21, Akash Shinde <akashshinde@gmail.com>:
>>>>>
>>>>>> Can someone please help me on this issue ?
>>>>>>
>>>>>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde <akashshinde@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> Please find attached logs from all server and client nodes.Also
>>>>>>> attached gc logs for each node.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Akash
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Can you please share full logs from the node start from all
nodes
>>>>>>>> in the cluster?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Evgenii
>>>>>>>>
>>>>>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde <akashshinde@gmail.com>:
>>>>>>>>
>>>>>>>>> I am using Ignite 2.6 version.  I have created a cluster
of 7
>>>>>>>>> server nodes and three client nodes. Out of seven nodes
five nodes stopped
>>>>>>>>> unexpectedly with below error logs lines.
>>>>>>>>> I have attached logs of two such server nodes.
>>>>>>>>>
>>>>>>>>> FailureDetectionTimeout is set to 30000 ms  in Ignite
>>>>>>>>> configuration.
>>>>>>>>> Network time out is default.
>>>>>>>>> ClientFailureDetectionTimeout is set to 30000 ms.
>>>>>>>>>
>>>>>>>>> I check gc logs but it does not seem to be GC pause issue.
I have
>>>>>>>>> attached GC logs too.
>>>>>>>>>
>>>>>>>>> 1) Can someone please help me to identify the reason
for this
>>>>>>>>> issue?
>>>>>>>>> 2) Are there any specific reasons which causes this issue
or it is
>>>>>>>>> a bug in Ignite 2.6 version?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *ERROR LOGS LINES*
>>>>>>>>> 2019-07-22 09:22:47,281 19417675
>>>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical
system error
>>>>>>>>> detected. Will be handled accordingly to configured handler
[hnd=class
>>>>>>>>> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
>>>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>>>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated
unexpectedly.]]
>>>>>>>>> java.lang.IllegalStateException: Thread
>>>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.
>>>>>>>>> at
>>>>>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>>>>>>>> at
>>>>>>>>> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>>>>>>>>> 2019-07-22 09:22:47,281 19417675
>>>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will
be halted immediately
>>>>>>>>> due to the failure: [failureCtx=FailureContext
>>>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>>>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated
unexpectedly.]]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Akash
>>>>>>>>>
>>>>>>>>

Mime
View raw message