helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Puneet Zaroo <puneetza...@gmail.com>
Subject Re: A state transition requirement.
Date Fri, 01 Mar 2013 14:33:47 GMT
Kishore,
Thanks for the prompt reply once again.

On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g.kishore@gmail.com> wrote:
> Hi Puneet,
>
> I was about to reply to your previous email but I think its better to have a
> separate thread for each requirement.
>

I agree.

> We already have ability 3 to trigger rebalance occasionally. Take a look at
> timer tasks in controller. But i dont think that will be sufficient in your
> case.
>
> There is another way to solve this which is probably easier to reason about
> and elegant.  Basically we can introduce a notion of timed transition ( we
> can discuss on how to implement this). What this means is when a node fails
> Helix can request another node to create the replica but with additional
> configuration that it should be scheduled after X timeout, we already have a
> notion of cancellable transitions built in. So if the old node comes up
> within that time helix can cancel the existence transition and put the old
> node back into SLAVE state.
>

The timed transition idea does look promising. I will have to think a
bit more about it.
I had a few more mundane questions.
In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
responsible for object placement. But how does the DDS implement the
object placement support.

The StateModelDefinition.Builder() class allows one to set the
"upperBound" and the "dynamicUpperBound". But how does one specify a
lower bound for a particular state ?

Can one safely say that in the "AUTO" mode no partitions will be ever
moved  by the controller to a new node, except when the DDS so
desires.
If a node were to go down and come back up, it will still host the
partitions that it had before going down.
Or will the partition move only happen when some constraints are being
violated. E.g. if the minimum number of replicas specified is "2",
then a partition will be assigned to a new node if there are just 2
replicas in the system and one of the nodes goes down.

Thanks again for your replies and for open-sourcing a great tool.

> This design does not require any additional work to handle failures of
> controllers or participants and any modification to state model. Its
> basically adding the notion of timed transition that can be cancelled if
> needed.
>
> What do you think about the solution? Does it make sense ?
>
> Regarding implementation, this solution can be implemented in the current
> state by simply adding additional sleep in the transition (OFFLINE to SLAVE)
> and in the custom code invoker you can first send cancel message to the
> existing transition and then set the ideal state. But its possible for Helix
> to automatically cancel it. We need to have additional logic in Helix that
> if there is a pending transition and if we compute another transition that
> is opposite of that, we can automatically detect that its cancellable and
> cancel the existing transition. That will make it more generic and we can
> then simply have the transition delay set as a configuration.
>
> thanks,
> Kishore G
>
>
> On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo <puneetzaroo@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I wanted to know how to implement a specific state machine requirement in
>> Helix.
>> Lets say a partition is in the state S2.
>>
>> 1. On an instance hosting it going down, the partition moves to state
>> S3 (but stays on the same instance).
>> 2. If the instance comes back up before a timeout expires, the
>> partition moves to state S1 (stays on the same instance).
>> 3. If the instance does not come back up before the timeout expiry,
>> the partition moves to state S0 (the initial state, on a different
>> instance picked up by the controller).
>>
>> I have a few questions.
>>
>> 1. I believe in order to implement Requirement 1, I have to use the
>> CUSTOM rebalancing feature (as otherwise the partitions will get
>> assigned to a new node).
>> The wiki page says the following about the CUSTOM mode.
>>
>> "Applications will have to implement an interface that Helix will
>> invoke when the cluster state changes. Within this callback, the
>> application can recompute the partition assignment mapping"
>>
>> Which interface does one have to implement ?  I am assuming the
>> callbacks are triggered inside the controller.
>>
>>  2. The transition from S2 -> S3 should not issue a callback on the
>> participant (instance) holding that partition. This is because the
>> participant is unavailable and so cannot execute the callback. Is this
>> doable ?
>>
>> 3. One way the time-out (Requirement 3) can be implemented is to
>> occasionally trigger IdealState calculation after a time-out and not
>> only on liveness changes. Does that sound doable ?
>>
>> thanks,
>> - Puneet
>
>

Mime
View raw message