helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: A state transition requirement.
Date Mon, 04 Mar 2013 04:47:31 GMT
Hi Puneet,

Your explanation is correct.

Regarding the race condition, yes its possible that N1 finished its
transition before receiving the cancellation. But then Helix will send a
opposite transition  SLAVE to OFFLINE to N1. Thats the best we can do.

Yes the support for conflicting transitions need to be built. Currently we
only have the ability to manually cancel a transition. We need the support
for canceling conflicting transitions. Lets file a JIRA and flush out the
design.

By the way, let me know about the other ideas you had. Its good to have
multiple options and discuss the pros and cons. For example, the problem
with delayed transition is it might add some delay during the cluster the
cluster start up.

thanks,
Kishore G






On Sun, Mar 3, 2013 at 8:02 PM, Puneet Zaroo <puneetzaroo@gmail.com> wrote:

> Kishore,
>
> Over the weekend I had some other thoughts of how to implement this.
> But thinking some more about it, the timed transition idea looks like
> the one that requires less intrusive changes to Helix. But please let
> me step through it slowly to understand it more.
>
> Lets say node N0 goes down and the partitions on it are moved to N1.
> Lets say  N1 receives the callback for the OFFLINE, SLAVE
> transition... but this transition has a configurable delay in it, and
> so does not complete immediately.
>
> In the meantime, node N0 comes back up, so the idealState is
> recalculated in the CustomCodeInvoker to move the partitions of N0
> back to it. This will make Helix cancel all other conflicting
> transitions. Does this cancellation get propagated to N1 (which is
> inside the OFFLINE, SLAVE transition). This seems a bit racy. What if
> N1 had finished its transition just before receiving the cancellation.
>
> And if I understand correctly, the support for cancelling conflicting
> transitions needs to be built.
>
> Thanks,
> - Puneet
>
>
>
> On Fri, Mar 1, 2013 at 7:33 AM, kishore g <g.kishore@gmail.com> wrote:
> > Hi Puneet,
> >
> > Your understanding of AUTO mode is correct, no partitions will be ever
> moved
> > by controller to a new node. And if node comes back up, it will still
> host
> > the partitions it had before going down.
> >
> > This is how it works,
> > in AUTO_REBALANCE Helix has full control so it will create new replicas,
> > assign states as needed.
> >
> > in AUTO mode, it will only not create new replicas unless the idealstate
> is
> > changed externally ( this can happen when you add new boxes).
> >
> >>>Or will the partition move only happen when some constraints are being
> >>>violated. E.g. if the minimum number of replicas specified is "2",
> >>>then a partition will be assigned to a new node if there are just 2
> >>>replicas in the system and one of the nodes goes down.
> >
> > In AUTO mode, Helix will try to satisfy the constraints with existing
> > replicas, so if you had assigned 2 replicas but 1 is down, it will see
> whats
> > the best it can do with that 1 replica. thats where the priority of
> states
> > come into picture, you specify master is more important than slave, so it
> > will make that replica a master.
> >
> > In AUTO_REBALANCE it would create that replica on another node. This
> mode is
> > generally suited for stateless systems where moving partition might
> simply
> > mean moving processing and not data.
> >
> > Thanks,
> > Kishore G
> >
> >
> >
> >
> >
> >
> > On Fri, Mar 1, 2013 at 6:33 AM, Puneet Zaroo <puneetzaroo@gmail.com>
> wrote:
> >>
> >> Kishore,
> >> Thanks for the prompt reply once again.
> >>
> >> On Tue, Feb 26, 2013 at 3:39 PM, kishore g <g.kishore@gmail.com> wrote:
> >> > Hi Puneet,
> >> >
> >> > I was about to reply to your previous email but I think its better to
> >> > have a
> >> > separate thread for each requirement.
> >> >
> >>
> >> I agree.
> >>
> >> > We already have ability 3 to trigger rebalance occasionally. Take a
> look
> >> > at
> >> > timer tasks in controller. But i dont think that will be sufficient in
> >> > your
> >> > case.
> >> >
> >> > There is another way to solve this which is probably easier to reason
> >> > about
> >> > and elegant.  Basically we can introduce a notion of timed transition
> (
> >> > we
> >> > can discuss on how to implement this). What this means is when a node
> >> > fails
> >> > Helix can request another node to create the replica but with
> additional
> >> > configuration that it should be scheduled after X timeout, we already
> >> > have a
> >> > notion of cancellable transitions built in. So if the old node comes
> up
> >> > within that time helix can cancel the existence transition and put the
> >> > old
> >> > node back into SLAVE state.
> >> >
> >>
> >> The timed transition idea does look promising. I will have to think a
> >> bit more about it.
> >> I had a few more mundane questions.
> >> In the "AUTO" mode (as opposed to the AUTO_REBALANCE mode), the DDS is
> >> responsible for object placement. But how does the DDS implement the
> >> object placement support.
> >>
> >> The StateModelDefinition.Builder() class allows one to set the
> >> "upperBound" and the "dynamicUpperBound". But how does one specify a
> >> lower bound for a particular state ?
> >>
> >> Can one safely say that in the "AUTO" mode no partitions will be ever
> >> moved  by the controller to a new node, except when the DDS so
> >> desires.
> >> If a node were to go down and come back up, it will still host the
> >> partitions that it had before going down.
> >> Or will the partition move only happen when some constraints are being
> >> violated. E.g. if the minimum number of replicas specified is "2",
> >> then a partition will be assigned to a new node if there are just 2
> >> replicas in the system and one of the nodes goes down.
> >>
> >> Thanks again for your replies and for open-sourcing a great tool.
> >>
> >> > This design does not require any additional work to handle failures of
> >> > controllers or participants and any modification to state model. Its
> >> > basically adding the notion of timed transition that can be cancelled
> if
> >> > needed.
> >> >
> >> > What do you think about the solution? Does it make sense ?
> >> >
> >> > Regarding implementation, this solution can be implemented in the
> >> > current
> >> > state by simply adding additional sleep in the transition (OFFLINE to
> >> > SLAVE)
> >> > and in the custom code invoker you can first send cancel message to
> the
> >> > existing transition and then set the ideal state. But its possible for
> >> > Helix
> >> > to automatically cancel it. We need to have additional logic in Helix
> >> > that
> >> > if there is a pending transition and if we compute another transition
> >> > that
> >> > is opposite of that, we can automatically detect that its cancellable
> >> > and
> >> > cancel the existing transition. That will make it more generic and we
> >> > can
> >> > then simply have the transition delay set as a configuration.
> >> >
> >> > thanks,
> >> > Kishore G
> >> >
> >> >
> >> > On Tue, Feb 26, 2013 at 12:12 PM, Puneet Zaroo <puneetzaroo@gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I wanted to know how to implement a specific state machine
> requirement
> >> >> in
> >> >> Helix.
> >> >> Lets say a partition is in the state S2.
> >> >>
> >> >> 1. On an instance hosting it going down, the partition moves to state
> >> >> S3 (but stays on the same instance).
> >> >> 2. If the instance comes back up before a timeout expires, the
> >> >> partition moves to state S1 (stays on the same instance).
> >> >> 3. If the instance does not come back up before the timeout expiry,
> >> >> the partition moves to state S0 (the initial state, on a different
> >> >> instance picked up by the controller).
> >> >>
> >> >> I have a few questions.
> >> >>
> >> >> 1. I believe in order to implement Requirement 1, I have to use the
> >> >> CUSTOM rebalancing feature (as otherwise the partitions will get
> >> >> assigned to a new node).
> >> >> The wiki page says the following about the CUSTOM mode.
> >> >>
> >> >> "Applications will have to implement an interface that Helix will
> >> >> invoke when the cluster state changes. Within this callback, the
> >> >> application can recompute the partition assignment mapping"
> >> >>
> >> >> Which interface does one have to implement ?  I am assuming the
> >> >> callbacks are triggered inside the controller.
> >> >>
> >> >>  2. The transition from S2 -> S3 should not issue a callback on
the
> >> >> participant (instance) holding that partition. This is because the
> >> >> participant is unavailable and so cannot execute the callback. Is
> this
> >> >> doable ?
> >> >>
> >> >> 3. One way the time-out (Requirement 3) can be implemented is to
> >> >> occasionally trigger IdealState calculation after a time-out and not
> >> >> only on liveness changes. Does that sound doable ?
> >> >>
> >> >> thanks,
> >> >> - Puneet
> >> >
> >> >
> >
> >
>

Mime
View raw message