helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Puneet Zaroo <puneetza...@gmail.com>
Subject Re: A state transition requirement.
Date Mon, 04 Mar 2013 14:06:57 GMT
Kishore,
Thanks for the helpful pointers as usual. You are correct that the
delayed transition will also delay the normal bootstrap of a node,
which is unacceptable. Thanks for pointing this out.

The idea I had in mind was to extend the notion of "REBALANCE_TIMER"
associated with each resource inside Helix to also support multiple
timers. Each timer would be associated with a node, and would
rebalance partitions hosted on it to other nodes. Supporting this
inside Helix would be too intrusive a change.

So, I could implement this outside of helix. I would need to implement
something similar to ZKHelixAdmin.rebalance(), but the rebalance()
would be a targeted rebalance that only rebalances partitions hosted
on a particular node.

thanks,
- Puneet

On Sun, Mar 3, 2013 at 8:47 PM, kishore g <g.kishore@gmail.com> wrote:
> Hi Puneet,
>
> Your explanation is correct.
>
> Regarding the race condition, yes its possible that N1 finished its
> transition before receiving the cancellation. But then Helix will send a
> opposite transition  SLAVE to OFFLINE to N1. Thats the best we can do.
>
> Yes the support for conflicting transitions need to be built. Currently we
> only have the ability to manually cancel a transition. We need the support
> for canceling conflicting transitions. Lets file a JIRA and flush out the
> design.
>
> By the way, let me know about the other ideas you had. Its good to have
> multiple options and discuss the pros and cons. For example, the problem
> with delayed transition is it might add some delay during the cluster the
> cluster start up.
>
> thanks,
> Kishore G
>
>
>
>
>
>
> On Sun, Mar 3, 2013 at 8:02 PM, Puneet Zaroo <puneetzaroo@gmail.com> wrote:
>>
>> Kishore,
>>
>> Over the weekend I had some other thoughts of how to implement this.
>> But thinking some more about it, the timed transition idea looks like
>> the one that requires less intrusive changes to Helix. But please let
>> me step through it slowly to understand it more.
>>
>> Lets say node N0 goes down and the partitions on it are moved to N1.
>> Lets say  N1 receives the callback for the OFFLINE, SLAVE
>> transition... but this transition has a configurable delay in it, and
>> so does not complete immediately.
>>
>> In the meantime, node N0 comes back up, so the idealState is
>> recalculated in the CustomCodeInvoker to move the partitions of N0
>> back to it. This will make Helix cancel all other conflicting
>> transitions. Does this cancellation get propagated to N1 (which is
>> inside the OFFLINE, SLAVE transition). This seems a bit racy. What if
>> N1 had finished its transition just before receiving the cancellation.
>>
>> And if I understand correctly, the support for cancelling conflicting
>> transitions needs to be built.
>>
>> Thanks,
>> - Puneet
>>
>>
>>
>> On Fri, Mar 1, 2013 at 7:33 AM, kishore g <g.kishore@gmail.com> wrote:
>> > Hi Puneet,
>> >
>> > Your understanding of AUTO mode is correct, no partitions will be ever
>> > moved
>> > by controller to a new node. And if node comes back up, it will still
>> > host
>> > the partitions it had before going down.
>> >
>> > This is how it works,
>> > in AUTO_REBALANCE Helix has full control so it will create new replicas,
>> > assign states as needed.
>> >
>> > in AUTO mode, it will only not create new replicas unless the idealstate
>> > is
>> > changed externally ( this can happen when you add new boxes).
>> >
>> >>>Or will the partition move only happen when some constraints are being
>> >>>violated. E.g. if the minimum number of replicas specified is "2",
>> >>>then a partition will be assigned to a new node if there are just 2
>> >>>replicas in the system and one of the nodes goes down.
>> >
>> > In AUTO mode, Helix will try to satisfy the constraints with existing
>> > replicas, so if you had assigned 2 replicas but 1 is down, it will see
>> > whats
>> > the best it can do with that 1 replica. thats where the priority of
>> > states
>> > come into picture, you specify master is more important than slave, so
>> > it
>> > will make that replica a master.
>> >
>> > In AUTO_REBALANCE it would create that replica on another node. This
>> > mode is
>> > generally suited for stateless systems where moving partition might
>> > simply
>> > mean moving processing and not data.
>> >
>> > Thanks,
>> > Kishore G
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Mar 1, 2013 at 6:33 AM, Puneet Zaroo <puneetzaroo@gmail.com>
>> > wrote:
>> >>
>> >> Kishore,
>> >> Thanks for the prompt reply once again.
>> >>
>> >> On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g.kishore@gmail.com> wrote:
>> >> > Hi Puneet,
>> >> >
>> >> > I was about to reply to your previous email but I think its better
to
>> >> > have a
>> >> > separate thread for each requirement.
>> >> >
>> >>
>> >> I agree.
>> >>
>> >> > We already have ability 3 to trigger rebalance occasionally. Take a
>> >> > look
>> >> > at
>> >> > timer tasks in controller. But i dont think that will be sufficient
>> >> > in
>> >> > your
>> >> > case.
>> >> >
>> >> > There is another way to solve this which is probably easier to reason
>> >> > about
>> >> > and elegant.  Basically we can introduce a notion of timed transition
>> >> > (
>> >> > we
>> >> > can discuss on how to implement this). What this means is when a node
>> >> > fails
>> >> > Helix can request another node to create the replica but with
>> >> > additional
>> >> > configuration that it should be scheduled after X timeout, we already
>> >> > have a
>> >> > notion of cancellable transitions built in. So if the old node comes
>> >> > up
>> >> > within that time helix can cancel the existence transition and put
>> >> > the
>> >> > old
>> >> > node back into SLAVE state.
>> >> >
>> >>
>> >> The timed transition idea does look promising. I will have to think a
>> >> bit more about it.
>> >> I had a few more mundane questions.
>> >> In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
>> >> responsible for object placement. But how does the DDS implement the
>> >> object placement support.
>> >>
>> >> The StateModelDefinition.Builder() class allows one to set the
>> >> "upperBound" and the "dynamicUpperBound". But how does one specify a
>> >> lower bound for a particular state ?
>> >>
>> >> Can one safely say that in the "AUTO" mode no partitions will be ever
>> >> moved  by the controller to a new node, except when the DDS so
>> >> desires.
>> >> If a node were to go down and come back up, it will still host the
>> >> partitions that it had before going down.
>> >> Or will the partition move only happen when some constraints are being
>> >> violated. E.g. if the minimum number of replicas specified is "2",
>> >> then a partition will be assigned to a new node if there are just 2
>> >> replicas in the system and one of the nodes goes down.
>> >>
>> >> Thanks again for your replies and for open-sourcing a great tool.
>> >>
>> >> > This design does not require any additional work to handle failures
>> >> > of
>> >> > controllers or participants and any modification to state model. Its
>> >> > basically adding the notion of timed transition that can be cancelled
>> >> > if
>> >> > needed.
>> >> >
>> >> > What do you think about the solution? Does it make sense ?
>> >> >
>> >> > Regarding implementation, this solution can be implemented in the
>> >> > current
>> >> > state by simply adding additional sleep in the transition (OFFLINE
to
>> >> > SLAVE)
>> >> > and in the custom code invoker you can first send cancel message to
>> >> > the
>> >> > existing transition and then set the ideal state. But its possible
>> >> > for
>> >> > Helix
>> >> > to automatically cancel it. We need to have additional logic in Helix
>> >> > that
>> >> > if there is a pending transition and if we compute another transition
>> >> > that
>> >> > is opposite of that, we can automatically detect that its cancellable
>> >> > and
>> >> > cancel the existing transition. That will make it more generic and
we
>> >> > can
>> >> > then simply have the transition delay set as a configuration.
>> >> >
>> >> > thanks,
>> >> > Kishore G
>> >> >
>> >> >
>> >> > On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo
>> >> > <puneetzaroo@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I wanted to know how to implement a specific state machine
>> >> >> requirement
>> >> >> in
>> >> >> Helix.
>> >> >> Lets say a partition is in the state S2.
>> >> >>
>> >> >> 1. On an instance hosting it going down, the partition moves to
>> >> >> state
>> >> >> S3 (but stays on the same instance).
>> >> >> 2. If the instance comes back up before a timeout expires, the
>> >> >> partition moves to state S1 (stays on the same instance).
>> >> >> 3. If the instance does not come back up before the timeout expiry,
>> >> >> the partition moves to state S0 (the initial state, on a different
>> >> >> instance picked up by the controller).
>> >> >>
>> >> >> I have a few questions.
>> >> >>
>> >> >> 1. I believe in order to implement Requirement 1, I have to use
the
>> >> >> CUSTOM rebalancing feature (as otherwise the partitions will get
>> >> >> assigned to a new node).
>> >> >> The wiki page says the following about the CUSTOM mode.
>> >> >>
>> >> >> "Applications will have to implement an interface that Helix will
>> >> >> invoke when the cluster state changes. Within this callback, the
>> >> >> application can recompute the partition assignment mapping"
>> >> >>
>> >> >> Which interface does one have to implement ?  I am assuming the
>> >> >> callbacks are triggered inside the controller.
>> >> >>
>> >> >>  2. The transition from S2 -> S3 should not issue a callback
on the
>> >> >> participant (instance) holding that partition. This is because
the
>> >> >> participant is unavailable and so cannot execute the callback.
Is
>> >> >> this
>> >> >> doable ?
>> >> >>
>> >> >> 3. One way the time-out (Requirement 3) can be implemented is to
>> >> >> occasionally trigger IdealState calculation after a time-out and
not
>> >> >> only on liveness changes. Does that sound doable ?
>> >> >>
>> >> >> thanks,
>> >> >> - Puneet
>> >> >
>> >> >
>> >
>> >
>
>

Mime
View raw message