ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: System Worker Failure Handler on local laptop
Date Thu, 27 Dec 2018 18:27:09 GMT
Folks,

What are the current timeouts? We need to know the probability of failures
in dev environment. This affect usability.

--
Denis

On Thu, Dec 27, 2018 at 4:59 AM Alexey Goncharuk <alexey.goncharuk@gmail.com>
wrote:

> Nikolay,
>
> Yes, the fix is already in master. Looks like I was wrong, in your case
> failure handler is triggered by 'Node is stopping: grid-2'. Can you please
> share the full trace?
>
>
>
> чт, 27 дек. 2018 г. в 12:41, Nikolay Izhikov <nizhikov@apache.org>:
>
> > Alexey
> >
> > Fix for this issue already in master?
> > I run tests on current master.
> >
> > > Should we somehow announce it on the user-list or highlight on
> readme.io
> > ?
> >
> > I don't think our users will be happy to users stuck with this behavior
> in
> > production.
> >
> > Am I understand you correctly:
> > If someone use 2.7. release and Ignite process slowing for a few seconds
> > for any reason(low-end hardwre, VM pause, other processes grab the
> > resources) then Ignite node will be stopped?
> >
> > > This is the issue I mentioned in "Critical worker threads liveness
> > checking
> > drawbacks" topic
> >
> > Thanks for the link, I will check it out.
> >
> > чт, 27 дек. 2018 г. в 12:24, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> >
> > > Hi Nikolay,
> > >
> > > This is the issue I mentioned in "Critical worker threads liveness
> > checking
> > > drawbacks" topic which I was expecting to be included to Ignite 2.7,
> but
> > it
> > > was not. To workaround the issue, you should set
> > > DataStorageConfiguration#setCheckpointReadLockTimeout to 0.
> > >
> > > Should we somehow announce it on the user-list or highlight on
> readme.io
> > ?
> > >
> > > чт, 27 дек. 2018 г. в 11:57, Nikolay Izhikov <nizhikov@apache.org>:
> > >
> > > > Hello, Igniters.
> > > >
> > > > I run into issue with critical system worker failure handler.
> > > > I just run `IgniteDataFrameSuite` and it terminates on random test.
> > > > My laptop doesn't have bleeding edge hardware, so tests can take
> > > > significant amount of time.
> > > > Looks like our watch dog too aggressive on development environment
> > > >
> > > > Can you please, help me. What should I do to configure or turn off
> > watch
> > > > dog?
> > > > Should we relax it a little bit? At least for a test environment.
> > > >
> > > > Error message contains following message:
> > > >
> > > > ```
> > > > [2018-12-27 11:40:23,597][ERROR][exchange-worker-#5547%grid-2%][root]
> > > > Critical system error detected. Will be handled accordingly to
> > configured
> > > > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> > > > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> > > > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> > > > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class
> > > > o.a.i.IgniteCheckedException: Node is stopping: grid-2]]
> > > > class org.apache.ignite.IgniteCheckedException: Node is stopping:
> > grid-2
> > > > ```
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message