ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduard Shangareev <eduard.shangar...@gmail.com>
Subject Re: New definition for affinity node (issues with baseline)
Date Tue, 24 Apr 2018 17:29:37 GMT
Igniters,

I have introduced DAT in opposition to BLAT (SAT) because they reflect how
Ignite works.

But I actually have concerns about the necessity of such separation.

DAT exists only because we don't want to lose any data in in-memory caches.

But there are alternatives. Besides BLAT auto-change policies I would pay
attention to next approach:
- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.

I don't want to propose any changes until we don't have consensus.



On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> Vladimir,
>
> Automatic cluster membership changes may be implemented to grow the
> topology, but auto-shrinking topology is usually not possible because a
> process cannot distinguish between a node shutdown and network
> partitioning. If we want to deal with split-brain scenarios as a grown-up
> system, we should change the replication strategy within partitions to a
> consensus algorithm (I really hope we will). None of the consensus
> algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
> adjustments based on a internally-detected process failure. I consider
> baseline topology as a step towards this model.
>
> Addressing your second concern, If a node was down for a short period of
> time, we should (and we do) rebalance only deltas, which is faster than
> erasing the whole node and moving all data from scratch.
>
> 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
>
> > Ivan,
> >
> > This reasoning sounds questionable to me. First, separate logic for in
> > memory and persistent regions means that we loose collocation between
> > persistent and non persistent caches. Second, “data is still on disk”
> > assumption might be not valid if node has left due to disk crash, or when
> > data is updated on remaining nodes.
> >
> > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <ivan.glukos@gmail.com>:
> >
> > > Stan,
> > >
> > > I believe it was discussed at the design proposal thread:
> > >
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Cluster-auto-activation-design-proposal-td20295.html
> > >
> > > The short answer: backup factor decreases if node leaves. In
> > > non-persistent mode we have to rebalance data ASAP - otherwise last
> node
> > > that owns partition may fail and data will be lost forever.
> > > This is not necessary if data is persisted to disk storage, that's the
> > > reason for Baseline Topology concept.
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 24.04.2018 18:48, Stanislav Lukyanov wrote:
> > > > + for Vladimir's point - adding more complexity may (and likely will)
> > be
> > > > even more misleading.
> > > >
> > > > Can we take a step back and discuss why do we need to have different
> > > > behavior for persistent and in-memory caches? Can we make in-memory
> > > caches
> > > > honor baseline instead of special-casing them?
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > >
> > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <vozerov@gridgain.com>:
> > > >
> > > >> Guys,
> > > >>
> > > >> As a user I definitely do not want to think about BLATs, SATs, DATs,
> > > >> whatsoever. I want to query data, iterate over data, send compute
> > tasks
> > > to
> > > >> data. If certain node is outside of BLAT and do not have data, then
> > > this is
> > > >> not affinity node. Can we just fix affinity logic to take in count
> > BLAT
> > > >> appropriately?
> > > >>
> > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <ivan.glukos@gmail.com>
> > > wrote:
> > > >>
> > > >>> Eduard,
> > > >>>
> > > >>> Can you please summarize code changes that you are proposing?
> > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more
> > sense.
> > > >>> However, establishing a consensus on v2.4 Baseline Topology
> > terminology
> > > >>> took a long time and seems like you are going to cause a bit more
> > > >>> perturbations.
> > > >>> I still don't understand what and how should be changed. Please
> > provide
> > > >>> summary of upcoming class renamings and changes of existing system
> > > parts.
> > > >>>
> > > >>> Best Regards,
> > > >>> Ivan Rakov
> > > >>>
> > > >>>
> > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote:
> > > >>>
> > > >>>> Hi, Igniters,
> > > >>>>
> > > >>>> I want to raise a topic about our affinity node definition.
> > > >>>>
> > > >>>> After adding baseline (affinity) topology (BL(A)T) things
start
> > being
> > > >>>> complicated.
> > > >>>>
> > > >>>> Plenty of bugs appears:
> > > >>>>
> > > >>>> IGNITE-8173
> > > >>>> ignite.getOrCreateCache(cacheConfig).iterator() method works
> > incorrect
> > > >>>> for
> > > >>>> replicated cache in case if some data node isn't in baseline
> > > >>>>
> > > >>>> IGNITE-7628
> > > >>>> SqlQuery hangs indefinitely with additional not registered
in
> > baseline
> > > >>>> node.
> > > >>>>
> > > >>>> It's because everything relies on concept "affinity node".
> > > >>>> And until now it was as simple as a server node which passes
node
> > > >> filter.
> > > >>>> Other words any server node which is not filtered out by node
> > filter.
> > > >>>>
> > > >>>> But node which is not in BL(A)T and which passes node filter
would
> > be
> > > >>>> treated as affinity node. And it's definitely wrong. At least,
it
> > is a
> > > >>>> source of many bugs (I believe there are much more than those
2
> > which
> > > I
> > > >>>> already have mentioned).
> > > >>>>
> > > >>>> It's clear that this definition should be changed.
> > > >>>> Let's start with a new definition of "Affinity topology".
Affinity
> > > >>>> topology
> > > >>>> is a set of nodes which potentially could keep data.
> > > >>>>
> > > >>>> If we use knowledge about the current realization we can say
that
> 1.
> > > for
> > > >>>> in-memory cache groups it would be all server nodes;
> > > >>>> 2. for persistent cache groups it would be BL(A)T.
> > > >>>>
> > > >>>> I will further use Dynamic Affinity Topology or DAT for 1
> (in-memory
> > > >> cache
> > > >>>> groups) and Static Affinity Topology or SAT instead BL(A)T,
or 2nd
> > > >> point.
> > > >>>> Denote node filter as f(X), where X is affinity topology.
> > > >>>>
> > > >>>> Then we can say that node A is affinity node if
> > > >>>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
> > > >>>>
> > > >>>> It worth to mention that AT' should be used to pass to affinity
> > > function
> > > >>>> of
> > > >>>> cache groups.
> > > >>>> Also, AT and AT' could change during the time (BL(A)T changes
or
> > node
> > > >>>> joins/disconnections).
> > > >>>>
> > > >>>> And I don't like fact that usage of DAT or SAT relies on
> persistence
> > > >>>> settings (Should we make it configurable per cache group?).
> > > >>>>
> > > >>>> Ok, I have created a ticket to implement this changes and
will
> start
> > > >>>> working on it.
> > > >>>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity
node
> > > >>>> calculation doesn't take into account BLT).
> > > >>>>
> > > >>>> Also, I want to use these definitions (Affinity Topology,
Affinity
> > > Node,
> > > >>>> DAT, SAT) in documentation and java docs.
> > > >>>>
> > > >>>> Maybe, we also should consider replacing BL(A)T with SAT.
> > > >>>>
> > > >>>> Thank you for your attention.
> > > >>>>
> > > >>>>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message