ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: System Worker Failure Handler on local laptop
Date Thu, 27 Dec 2018 12:52:17 GMT
Nikolay,

Yes, the fix is already in master. Looks like I was wrong, in your case
failure handler is triggered by 'Node is stopping: grid-2'. Can you please
share the full trace?



чт, 27 дек. 2018 г. в 12:41, Nikolay Izhikov <nizhikov@apache.org>:

> Alexey
>
> Fix for this issue already in master?
> I run tests on current master.
>
> > Should we somehow announce it on the user-list or highlight on readme.io
> ?
>
> I don't think our users will be happy to users stuck with this behavior in
> production.
>
> Am I understand you correctly:
> If someone use 2.7. release and Ignite process slowing for a few seconds
> for any reason(low-end hardwre, VM pause, other processes grab the
> resources) then Ignite node will be stopped?
>
> > This is the issue I mentioned in "Critical worker threads liveness
> checking
> drawbacks" topic
>
> Thanks for the link, I will check it out.
>
> чт, 27 дек. 2018 г. в 12:24, Alexey Goncharuk <alexey.goncharuk@gmail.com
> >:
>
> > Hi Nikolay,
> >
> > This is the issue I mentioned in "Critical worker threads liveness
> checking
> > drawbacks" topic which I was expecting to be included to Ignite 2.7, but
> it
> > was not. To workaround the issue, you should set
> > DataStorageConfiguration#setCheckpointReadLockTimeout to 0.
> >
> > Should we somehow announce it on the user-list or highlight on readme.io
> ?
> >
> > чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <nizhikov@apache.org>:
> >
> > > Hello, Igniters.
> > >
> > > I run into issue with critical system worker failure handler.
> > > I just run `IgniteDataFrameSuite` and it terminates on random test.
> > > My laptop doesn't have bleeding edge hardware, so tests can take
> > > significant amount of time.
> > > Looks like our watch dog too aggressive on development environment
> > >
> > > Can you please, help me. What should I do to configure or turn off
> watch
> > > dog?
> > > Should we relax it a little bit? At least for a test environment.
> > >
> > > Error message contains following message:
> > >
> > > ```
> > > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root]
> > > Critical system error detected. Will be handled accordingly to
> configured
> > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> > > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class
> > > o.a.i.IgniteCheckedException: Node is stopping: grid-2]]
> > > class org.apache.ignite.IgniteCheckedException: Node is stopping:
> grid-2
> > > ```
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message