helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kanak Biscuitwala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-276) Allow FULL_AUTO mode to favor some transitions
Date Wed, 23 Oct 2013 01:21:42 GMT

    [ https://issues.apache.org/jira/browse/HELIX-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802496#comment-13802496

Kanak Biscuitwala commented on HELIX-276:

Also, for the "quick failover" case in particular, we could support temporarily honoring transition
preferences until the new node is caught up, and then do the Slave --> Master transition.
This would get state balance out of sync while the bootstrapping is going on while ensuring
eventual state balance. This may be a good way to go, especially in the short term.

> Allow FULL_AUTO mode to favor some transitions
> ----------------------------------------------
>                 Key: HELIX-276
>                 URL: https://issues.apache.org/jira/browse/HELIX-276
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Matthieu Morel
>            Assignee: Kanak Biscuitwala
> In FULL_AUTO mode, helix computes both partitioning and states.
> Currently, in a master-replica model, when rebalancing due to a failure of the master
node, Helix does not promote an existing replica to master, but instead assigns a new master
(I.e. offline -> replica -> master).
> The current algorithm optimizes for minimal partition movement and even distribution
of state. However, it should also take into account the priorities between states, or provide
a way to customize it. For instance, when it is more costly (number of transitions, priorities)
to perform offline -> master than replica -> master, the algorithm could favor replica
-> master transitions.
> One application would be for quick failover : mater ops are logged to a journal, a replica
builds its state by tailing the journal, and upon failure of the master, recovery is fast
since only a few operations may have to be played to reach the latest state of the master.
If a new node is assigned master role from scratch, the whole journal must be replayed.
> More context in this thread:
> http://markmail.org/message/inq6tnlnk5ckscwr

This message was sent by Atlassian JIRA

View raw message