helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: A few questions about helix.
Date Tue, 26 Feb 2013 21:37:50 GMT
Hi Puneet,

CustomCodeInvoker can run either in controller or participant, but the more
appealing use case is running it in participant.

I had implemented a way for the controller to query the participants before
calculating the ideal state, i had to remove it because one of the
libraries we used was not apache license. We used it for exactly the same
requirement of chosing the master, but instead of asking the participants
to run the election we ask them to update their SCN in ZK and based on that
we re order the preference list in idealstate dynamically.
Another reason why i removed it was even though the idea was good, i dint
like the implementation.
Given that you have a similar requirement, its probably a good idea to
brainstorm on multiple solutions and come up with an elegant solution.
I still like solving it via state machine abstraction like LEADER_ELECTION,
MASTER_READY.

Another design principle of Helix is controller not talking to participants
directly, this will work and probably be fast for small clusters but if the
size of cluster because large controller will become the bottle neck. We
want to use the push/pull model where controller pushes to ZK and
participants pull it from ZK. It allows us the solution to be fault
tolerant and extensible. We have put in lot of enhancements in Helix use ZK
in an optimal way.

I see that you have started another thread for creating slave after some
time. Lets continue the discussion in that thread.

thanks,
Kishore G







On Sun, Feb 24, 2013 at 3:47 PM, Puneet Zaroo <puneetzaroo@gmail.com> wrote:

> Kishore,
> Thanks for the helpful and detailed answers once again.
>
> On Sun, Feb 24, 2013 at 8:45 AM, kishore g <g.kishore@gmail.com> wrote:
> > 3) Regarding overhead in case of too many spectators.
> > Do you mean over head in terms of  controller informing the spectator ?
> > Controller does not communicate directly with the spectator. All
> > communication is via zookeeper. Its more like a push/pull model where
> > controller pushes to ZK and spectators pull from ZK. This is an important
> > difference from other systems where controllers communicate directly with
> > other  components in the system. This allows us to scale the system and
> not
> > be bottle necked by controller. Eventually ZK might be a bottle neck but
> for
> > spectators we can easily scale reads on ZK by adding more ZK observers.
> In
> > fact, if the system as lot of spectators its better to connect only to ZK
> > observers. Apart from Helix has group commit feature where transitions
> are
> > grouped together where reduces the number of notifications to spectators.
> >
>
> Thanks for the clarification.
>
> > 2) We dont have the feature to wait for configurable time before
> selecting
> > another slave partition. We have been asked for this feature many times,
> we
> > should probably add it :-).  However, we do have another feature which
> might
> > actually be useful and more elegant. You can pause/unpause the
> controller.
> > When the controller is paused no transitions will occur in the system. Is
> > this something that will be useful? The pause/unpause is a cluster level.
> >
>
> The global pause does not work for us, as we want other transitions,
> e.g. slave -> master to keep on happening.
> The requirement in more detail is this.
>
> When a node hosting a partition in the SLAVE state  becomes
> unreachable, the partition should not be immediately assigned to a new
> node in the OFFLINE state. Lets say there is an additional state into
> which the partition can go, which is the DOWN state; but the partition
> is not reassigned. It stays on the same node for a configurable
> timeout. If the node comes back up before the timeout expires, the
> partition transitions back to the SLAVE state on the same node. If the
> node remains down past the timeout, the partition is reassigned to a
> new node and enters the OFFLINE state.
>
> How much work would it be to add such support in Helix ?
>
>
> > 1) CustoomcodeInvoker example
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-helix.git;a=blob;f=helix-core/src/test/java/org/apache/helix/integration/TestHelixCustomCodeRunner.java;h=9bf79b8b34c14b7ce1e3fc45a45ceb19fdac4874;hb=437eb42e
> >
> A very newbie question perhaps.
> Does the CustomCondeInvoker run on the controller or the participant ?
> I initially thought it runs on the controller and allows one to extend
> the controller. But it seems it runs on the participants, but I maybe
> wrong.
>
> > regarding LEADER_ELECTION state model. I see what you mean, this is
> actually
> > a very nice and cool idea. I got the part until all participants getting
> > into LEADER_ELECTION state and one of them is selected as the master.
> What
> > happens after that?
> >
> > a. what will be the outcome of this transition SLAVE-->LEADER_ELECTION on
> > each participant ?
> > b. What will be the new idealstate which will allow one of the
> participants
> > to become MASTER and others SLAVE.
> >
>
> These are pretty preliminary ideas on my part and perhaps there are
> better ways of doing this in Helix. I was thinking the participant
> elected as the "MASTER" sets the idealState in which it specifies
> itself as the MASTER for that partition and the other nodes as
> "SLAVES". So, this participant undergoes a "LEADER_ELECTION" ->
> "MASTER" transition while the other participants do a
> "LEADER_ELECTION" -> "SLAVE" transition, with the transitions being
> sent out by the controller. Only the idealState is being specified by
> the winning participant. Again, not very sure if this is practical.
>
> > Another feature I thought about sometime back is conditional transition.
> > Basically have a transition that can have two outcomes, so in this case
> we
> > can have something like LEADER_ELECTION -> MASTER_READY, SLAVE and then
> do
> > the election in that transition and either go to MASTER_READY or go back
> to
> > SLAVE state. Helix can then promote MASTER_READY to MASTER. We might need
> > some changes in Helix but looks doable. We should file a jira for this
> > feature and  track this discussion .
> >
>
> The conditional transition idea also sounds promising. The only thing
> to consider is how would the controller know which participant to pick
> as MASTER_READY, as this selection depends on some information
> available on the participants themselves.  Is there a way for the
> controller to query the participants before calculating the
> IdealState. If its possible to do so, the conditional transition idea
> seems elegant.
>
> Thanks again for the engaging discussion.
> - Puneet
>
> >
> >
> >
> > On Sat, Feb 23, 2013 at 7:22 PM, Puneet Zaroo <puneetzaroo@gmail.com>
> wrote:
> >>
> >> Kishore,
> >> Thanks for the detailed reply.
> >> Please see further comments inline.
> >>
> >> >
> >> > 3) Spectator is informed of the changes due to each state transition.
> >> >
> >>
> >> OK. Will that not cause a lot of overhead if there are a lot of
> >> Spectators in the system. Or was the rationale that there will be just
> >> a few spectators in the system.
> >>
> >> > 2) Yes it is possible to throttle the state transitions in a
> controlled
> >> > manner. You can basically specify the max number of transitions that
> can
> >> > occur at a resource, instance, instanceGroup, Cluster level. Helix
> will
> >> > ensure that none of those constraints are violated.
> >> >
> >>
> >> What I had in mind was throttling based on time and not the number of
> >> events. I.e. if a slave partition is lost, then the controller should
> >> wait for some configurable time before selecting another slave
> >> partition. This is to handle the case where a node is rebooting and we
> >> do not want its partitions to be moved to a new node immediately.
> >>
> >> > 1) Interpose Primary selection, yes it is possible  implement a custom
> >> > primary selection algorithm. Here is how we achieve that in LinkedIn
> >> >
> >> > a) A separate entity watches the ExternalView and as soon as it finds
> >> > out
> >> > there is no primary for a partition, it can do the leader election and
> >> > set
> >> > the idealstate. You can do this using the CustomCodeInvoker option
> which
> >> > ensures only one process watches the external view and computes the
> new
> >> > primary and sets the idealstate.
> >> >
> >> > Your suggestion of LEADER_ELECTION state sounds interesting. Can you
> >> > elaborate a bit more on the state machine ( states and transitions and
> >> > constraints). How will they get into this state?.
> >> >
> >>
> >> Are there any examples of how to use the CustomCodeInvoker ?
> >>
> >> Regarding a separate entity watching the ExternalView. Maybe I did not
> >> follow this fully, but the external entity looks similar to the
> >> controller; so I am not sure if this would solve the particular
> >> problem.
> >>
> >> We actually want the participants to take part in the decision of who
> >> should become the next Primary or Master. I havent thought this
> >> through completely, but one way could be to add a state
> >> "LEADER_ELECTION" between the states "SLAVE" and "MASTER". In the
> >> "LEADER_ELECTION" state the participants communicate with each other
> >> and decide who should be the next Master, and the participant elected
> >> as the next "Master" sets the IdealState.  This is fully auto mode,
> >> except for one transition "LEADER_ELECTION" -> "MASTER" which is
> >> custom.
> >> Perhaps there are simpler ways of doing this.
> >>
> >> thanks,
> >> - Puneet
> >>
> >>
> >>
> >> > On Thu, Feb 21, 2013 at 5:22 PM, Puneet Zaroo <puneetzaroo@gmail.com>
> >> > wrote:
> >> >>
> >> >> I am a helix newbie. I have read the paper and the wiki pages and am
> >> >> just starting to get familiar with the source code. I had a few
> >> >> questions :
> >> >>
> >> >> 1) Is it possible to interpose on Primary selection. I.e. instead of
> >> >> relying completely on Helix to select a Primary, is it possible to
> >> >> implement a voting based protocol, where the replicas have a say in
> >> >> who becomes the next primary. One possible way would be to have a
> >> >> state "LEADER_ELECTION", in which the replicas do the voting, and
> >> >> finally just the winner sets the ideal state with itself as the
> >> >> PRIMARY.
> >> >>
> >> >> Are there any gotchas in what I outlined above, or is there a
> >> >> completely alternative and better way of doing this ?
> >> >>
> >> >> 2) Is it possible to throttle state transitions. E.g. If a node goes
> >> >> offline, the replicas hosted on it should not be transferred to a new
> >> >> node immediately; but in a throttled manner.
> >> >>
> >> >> 3) When is a spectator informed of the new ExternalView ? Is it when
> >> >> currentState becomes equal to the idealState, or are they informed
on
> >> >> all state changes due to each state transition.
> >> >>
> >> >> thanks,
> >> >> - Puneet
> >> >
> >> >
> >
> >
>

Mime
View raw message