helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kishore gopalakrishna (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-56) Delayed state transition
Date Mon, 04 Mar 2013 06:15:13 GMT

    [ https://issues.apache.org/jira/browse/HELIX-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592017#comment-13592017
] 

kishore gopalakrishna commented on HELIX-56:
--------------------------------------------

Response from Kishore

We already have ability 3 to trigger rebalance occasionally. Take a look at timer tasks in
controller. But i dont think that will be sufficient in your case.

There is another way to solve this which is probably easier to reason about and elegant. 
Basically we can introduce a notion of timed transition ( we can discuss on how to implement
this). What this means is when a node fails Helix can request another node to create the replica
but with additional configuration that it should be scheduled after X timeout, we already
have a notion of cancellable transitions built in. So if the old node comes up within that
time helix can cancel the existence transition and put the old node back into SLAVE state.

This design does not require any additional work to handle failures of controllers or participants
and any modification to state model. Its basically adding the notion of timed transition that
can be cancelled if needed.

What do you think about the solution? Does it make sense ? 

Regarding implementation, this solution can be implemented in the current state by simply
adding additional sleep in the transition (OFFLINE to SLAVE) and in the custom code invoker
you can first send cancel message to the existing transition and then set the ideal state.
But its possible for Helix to automatically cancel it. We need to have additional logic in
Helix that if there is a pending transition and if we compute another transition that is opposite
of that, we can automatically detect that its cancellable and cancel the existing transition.
That will make it more generic and we can then simply have the transition delay set as a configuration.
                
> Delayed state transition
> ------------------------
>
>                 Key: HELIX-56
>                 URL: https://issues.apache.org/jira/browse/HELIX-56
>             Project: Apache Helix
>          Issue Type: Task
>            Reporter: kishore gopalakrishna
>
> The requirement from Puneet
> I wanted to know how to implement a specific state machine requirement in Helix.
> Lets say a partition is in the state S2.
> 1. On an instance hosting it going down, the partition moves to state
> S3 (but stays on the same instance).
> 2. If the instance comes back up before a timeout expires, the
> partition moves to state S1 (stays on the same instance).
> 3. If the instance does not come back up before the timeout expiry,
> the partition moves to state S0 (the initial state, on a different
> instance picked up by the controller).
> I have a few questions.
> 1. I believe in order to implement Requirement 1, I have to use the
> CUSTOM rebalancing feature (as otherwise the partitions will get
> assigned to a new node).
> The wiki page says the following about the CUSTOM mode.
> "Applications will have to implement an interface that Helix will
> invoke when the cluster state changes. Within this callback, the
> application can recompute the partition assignment mapping"
> Which interface does one have to implement ?  I am assuming the
> callbacks are triggered inside the controller.
>  2. The transition from S2 -> S3 should not issue a callback on the
> participant (instance) holding that partition. This is because the
> participant is unavailable and so cannot execute the callback. Is this
> doable ?
> 3. One way the time-out (Requirement 3) can be implemented is to
> occasionally trigger IdealState calculation after a time-out and not
> only on liveness changes. Does that sound doable ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message