ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Gura <ag...@apache.org>
Subject Re: GridDhtInvalidPartitionException takes the cluster down
Date Mon, 25 Mar 2019 12:52:03 GMT
Failure handlers were introduced in order to avoid cluster hanging and
they kill nodes instead.

If critical worker was terminated by GridDhtInvalidPartitionException
then your node is unable to work anymore.

Unexpected cluster shutdown with reasons in logs that failure handlers
provide is better than hanging. So answer is NO. We mustn't disable
failure handlers.

On Mon, Mar 25, 2019 at 2:47 PM Roman Shtykh <rshtykh@yahoo.com.invalid> wrote:
>
> If it sticks to the behavior we had before introducing failure handler, I think it's
better to have disabled instead of killing the whole cluster, as in my case, and create a
parent issue for those ten bugs.Pavel, thanks for the suggestion!
>
>
>
>     On Monday, March 25, 2019, 7:07:20 p.m. GMT+9, Nikolay Izhikov <nizhikov@apache.org>
wrote:
>
>  Guys.
>
> We should fix the SYSTEM_WORKER_TERMINATION once and for all.
> Seems, we have ten or more "cluster shutdown" bugs with this subsystem
> since it was introduced.
>
> Should we disable it by default in 2.7.5?
>
>
> пн, 25 мар. 2019 г. в 13:04, Pavel Kovalenko <jokserfn@gmail.com>:
>
> > Hi Roman,
> >
> > I think this InvalidPartition case can be simply handled
> > in GridCacheTtlManager.expire method.
> > For workaround a custom FailureHandler can be configured that will not stop
> > a node in case of such exception is thrown.
> >
> > пн, 25 мар. 2019 г. в 08:38, Roman Shtykh <rshtykh@yahoo.com.invalid>:
> >
> > > Igniters,
> > >
> > > Restarting a node when injecting data and having it expired, results at
> > > GridDhtInvalidPartitionException which terminates nodes with
> > > SYSTEM_WORKER_TERMINATION one by one taking the whole cluster down. This
> > is
> > > really bad and I didn't find the way to save the cluster from
> > disappearing.
> > > I created a JIRA issue
> > https://issues.apache.org/jira/browse/IGNITE-11620
> > > with a test case. Any clues how to fix this inconsistency when
> > rebalancing?
> > >
> > > -- Roman
> > >
> >

Mime
View raw message