helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kanak Biscuitwala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-276) Allow FULL_AUTO mode to favor some transitions
Date Wed, 23 Oct 2013 01:15:42 GMT

    [ https://issues.apache.org/jira/browse/HELIX-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802490#comment-13802490
] 

Kanak Biscuitwala commented on HELIX-276:
-----------------------------------------

There needs to be some more specification here. Here is an example case where transition preferences
may not make sense:

1. The first node is launched
2. Helix assigns one replica of each partition to the node, and puts them all in Master state
(as is governed by state priorities)
3. A second node is launched
4. Helix assigns a second replica of each partition to the second node

Here, we probably want some of the replicas in the second node to be in state Master. Otherwise,
a single node failure would force a large number of Slave --> Master transitions at once.
However, this would violate potential transition preferences.

An alternative is that we "prefer" existing replicas, but not at the expense of state balance.
This is more of a "we'll try our best not to force multiple transitions" decision.

Another alternative is that we only try to do this when nodes are removed (but not added).

In any case, probably the "right" thing to do is to list out a few more scenarios, come up
with some configuration properties associated with those scenarios, and then expose those
as an API. I'm inclined to leave the "default" behavior more or less as-is, but with an additional
tiebreaker, but a config API would help individuals apps choose the right policy for their
applications.

I will work on a design for this API and will add updates to this ticket.

> Allow FULL_AUTO mode to favor some transitions
> ----------------------------------------------
>
>                 Key: HELIX-276
>                 URL: https://issues.apache.org/jira/browse/HELIX-276
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Matthieu Morel
>            Assignee: Kanak Biscuitwala
>
> In FULL_AUTO mode, helix computes both partitioning and states.
> Currently, in a master-replica model, when rebalancing due to a failure of the master
node, Helix does not promote an existing replica to master, but instead assigns a new master
(I.e. offline -> replica -> master).
> The current algorithm optimizes for minimal partition movement and even distribution
of state. However, it should also take into account the priorities between states, or provide
a way to customize it. For instance, when it is more costly (number of transitions, priorities)
to perform offline -> master than replica -> master, the algorithm could favor replica
-> master transitions.
> One application would be for quick failover : mater ops are logged to a journal, a replica
builds its state by tailing the journal, and upon failure of the master, recovery is fast
since only a few operations may have to be played to reach the latest state of the master.
If a new node is assigned master role from scratch, the whole journal must be replayed.
> More context in this thread:
> http://markmail.org/message/inq6tnlnk5ckscwr



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message