ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: Prohibit stateful affinity (FairAffinityFunction)
Date Mon, 10 Apr 2017 15:11:59 GMT
Guys,

To my knowledge FairAffinity, which is the most balanced distribution
possible, works just fine whenever the caches are configured on startup. I
think we should keep it, but throw an exception whenever a cache is started
dynamically (after the system start) with FairAffinity configured. Am I
missing something here?

As far as RendezvousAffinity, I don't like the we start migrating extra
partitions. To my knowledge, Michael Grigs implemented a close to even
partition distribution with a much better hash function. Do we really need
to improve even more?

D.

On Mon, Apr 10, 2017 at 8:00 AM, Sergi Vladykin <sergi.vladykin@gmail.com>
wrote:

> Absolutely agree, lets get some numbers on RendezvousAffinity with both
> variants: useBalancer enabled and disabled. Taras, can you provide them?
>
> Anyways at the moment we need to make a decision on what will get into 2.0.
> I'm for dropping (or hiding) all the suspicious stuff and adding it back if
> we fix it. Thus I'm going to move FairAffinity into private package now.
>
> Sergi
>
> 2017-04-10 16:55 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
>
> > Sergi,
> >
> > AFAIK the only reason why RendezvousAffinity is used by default is that
> > behavior on rebalance is no less important than steady state performance,
> > especially on large deployments and cloud environments, when nodes
> > constantly joins and leaves topology. Let's stop guessing and discuss the
> > numbers - how many partitions reassignments happen with new
> > RendezvousAffinity flavor? I haven't seen any results so far.
> >
> > On Mon, Apr 10, 2017 at 4:39 PM, Andrey Gura <agura@apache.org> wrote:
> >
> > > Guys,
> > >
> > > It seems that both mentioned problem have the same root cause: each
> > > cache has personal affinity function instance and it leads to
> > > perfromance problem (we retry the same calcualtions for each cache)
> > > and problem related with fact that FailAffinityFunction is statefull
> > > (some co-located cache has different assignment if it was started on
> > > different topology).
> > >
> > > Obvious solution is the same affinity for some cache set. As result
> > > all caches from one set will use the same assignment that will be
> > > calculated exactly once and will not depend on cache start topology.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Apr 10, 2017 at 4:05 PM, Sergi Vladykin
> > > <sergi.vladykin@gmail.com> wrote:
> > > > As for default value for useBalancer flag, I agree with Yakov, it
> must
> > be
> > > > enabled by default. Because performance in steady state is usually
> more
> > > > important than performance of rebalancing. For edge cases it can be
> > > > disabled.
> > > >
> > > > Sergi
> > > >
> > > > 2017-04-10 15:04 GMT+03:00 Sergi Vladykin <sergi.vladykin@gmail.com
> >:
> > > >
> > > >> If the RendezvousAffinity with enabled useBalancer is not much worse
> > > than
> > > >> FairAffinity, I see no reason to keep the latter.
> > > >>
> > > >> Sergi
> > > >>
> > > >> 2017-04-10 13:00 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> > > >>
> > > >>> Guys,
> > > >>>
> > > >>> We should not have it enabled by default because as Taras
> mentioned:
> > > "but
> > > >>> in this case there is not guarantee that a partition doesn't move
> > from
> > > one
> > > >>> node to another when node leave topology". Let's avoid any rush
> here.
> > > >>> There
> > > >>> is nothing terribly wrong with FairAffinity. It is not enabled
by
> > > default
> > > >>> and at the very least we can always mark it as deprecated. It
is
> > > better to
> > > >>> test rigorously rendezvous affinity first in terms of partition
> > > >>> distribution and partition migration and decide whether results
are
> > > >>> acceptable.
> > > >>>
> > > >>> On Mon, Apr 10, 2017 at 12:43 PM, Yakov Zhdanov <
> yzhdanov@apache.org
> > >
> > > >>> wrote:
> > > >>>
> > > >>> > We should have it enabled by default.
> > > >>> >
> > > >>> > --Yakov
> > > >>> >
> > > >>> > 2017-04-10 12:42 GMT+03:00 Sergi Vladykin <
> > sergi.vladykin@gmail.com
> > > >:
> > > >>> >
> > > >>> > > Why wouldn't we have useBalancer always enabled?
> > > >>> > >
> > > >>> > > Sergi
> > > >>> > >
> > > >>> > > 2017-04-10 12:31 GMT+03:00 Taras Ledkov <tledkov@gridgain.com
> >:
> > > >>> > >
> > > >>> > > > Folks,
> > > >>> > > >
> > > >>> > > > I worked on issue https://issues.apache.org/
> > > jira/browse/IGNITE-3018
> > > >>> > that
> > > >>> > > > is related to performance of Rendezvous AF.
> > > >>> > > >
> > > >>> > > > But Wang/Jenkins hash integer hash distribution
is worse then
> > > MD5.
> > > >>> So,
> > > >>> > i
> > > >>> > > > try to use simple partition balancer close
> > > >>> > > > to Fair AF for Rendezvous AF.
> > > >>> > > >
> > > >>> > > > Take a look at the heatmaps of distributions at
the issue.
> > e.g.:
> > > >>> > > > - Compare of current Rendezvous AF and new Rendezvous
AF
> based
> > of
> > > >>> > > > Wang/Jenkins hash: https://issues.apache.org/jira
> > > >>> > > > /secure/attachment/12858701/004.png
> > > >>> > > > - Compare of current Rendezvous AF and new Rendezvous
AF
> based
> > of
> > > >>> > > > Wang/Jenkins hash with partition balancer:
> > > >>> > > https://issues.apache.org/jira
> > > >>> > > > /secure/attachment/12858690/balanced.004.png
> > > >>> > > >
> > > >>> > > > When the balancer is enabled the distribution of
partitions
> by
> > > nodes
> > > >>> > > looks
> > > >>> > > > like close to even distribution
> > > >>> > > > but in this case there is not guarantee that a
partition
> > doesn't
> > > >>> move
> > > >>> > > from
> > > >>> > > > one node to another
> > > >>> > > > when node leave topology.
> > > >>> > > > It is not guarantee but we try to minimize it because
sorted
> > > array
> > > >>> of
> > > >>> > > > nodes is used (like in for pure-Rendezvous AF).
> > > >>> > > >
> > > >>> > > > I think we can use new fast Rendezvous AF and use
> 'useBalancer'
> > > flag
> > > >>> > > > instead of Fair AF.
> > > >>> > > >
> > > >>> > > > On 09.04.2017 14:12, Valentin Kulichenko wrote:
> > > >>> > > >
> > > >>> > > >> What is the replacement for FairAffinityFunction?
> > > >>> > > >>
> > > >>> > > >> Generally I agree. If FairAffinityFunction
can't be changed
> to
> > > >>> provide
> > > >>> > > >> consistent mapping, it should be dropped.
> > > >>> > > >>
> > > >>> > > >> -Val
> > > >>> > > >>
> > > >>> > > >> On Sun, Apr 9, 2017 at 3:50 AM, Sergi Vladykin
<
> > > >>> > > sergi.vladykin@gmail.com
> > > >>> > > >> <mailto:sergi.vladykin@gmail.com>>
wrote:
> > > >>> > > >>
> > > >>> > > >>     Guys,
> > > >>> > > >>
> > > >>> > > >>     It appeared that our FairAffinityFunction
can assign the
> > > same
> > > >>> > > >>     partitions to
> > > >>> > > >>     different nodes for different caches.
> > > >>> > > >>
> > > >>> > > >>     It basically means that there is no collocation
between
> > the
> > > >>> caches
> > > >>> > > >>     at all
> > > >>> > > >>     even if they have the same affinity.
> > > >>> > > >>
> > > >>> > > >>     As a result all SQL joins will not work
(even collocated
> > > ones),
> > > >>> > > other
> > > >>> > > >>     operations that rely on cache collocation
will be either
> > > >>> broken or
> > > >>> > > >>     work
> > > >>> > > >>     slower, than expected.
> > > >>> > > >>
> > > >>> > > >>     All this stuff is really non-obvious. And
I see no
> reason
> > > why
> > > >>> we
> > > >>> > > >>     should
> > > >>> > > >>     allow that. I suggest to prohibit this
behavior and drop
> > > >>> > > >>     FairAffinityFunction before 2.0. We have
to clearly
> > document
> > > >>> that
> > > >>> > > >>     the same
> > > >>> > > >>     affinity function must provide the same
partition
> > > assignments
> > > >>> for
> > > >>> > > >>     all the
> > > >>> > > >>     caches.
> > > >>> > > >>
> > > >>> > > >>     Also I know that Taras Ledkov was working
on a decent
> > > stateless
> > > >>> > > >>     replacement
> > > >>> > > >>     for FairAffinity, so we should not loose
anything here.
> > > >>> > > >>
> > > >>> > > >>     Thoughts?
> > > >>> > > >>
> > > >>> > > >>     Sergi
> > > >>> > > >>
> > > >>> > > >>
> > > >>> > > >>
> > > >>> > > > --
> > > >>> > > > Taras Ledkov
> > > >>> > > > Mail-To: tledkov@gridgain.com
> > > >>> > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message