helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Porcelli <porce...@redhat.com>
Subject Re: Some newbie questions
Date Tue, 21 May 2013 20:47:13 GMT
Thank you all for your very precious inputs... time to move on ;)

@Kishore, I was already doing almost exactly what you described... sorry to bother you on
this topic, but it was important to me to be sure that I was following the right path.


Regards,
---
Alexandre Porcelli
porcelli@redhat.com

On May 21, 2013, at 4:08 PM, kishore g <g.kishore@gmail.com> wrote:

> Thanks Alexandre, MasterSlave would have been ideal but i see that you have already considered
that. Here is the outline of what I have understood till now 
> 
> on each node
> 
> onstartup
> -- sync with current leader
> 
> onwriteRequest
> --- wait to become leader
> --- once you become leader, you do the data to git local, send message to every one to
pull and wait for response.
> --- after all  standby acknowledges the request, release the lock.
> 
> How to solve this using Helix ( your idea of using two separate resources and state models
actually makes sense)
> =====================
> 	• Have two resource a)data_availability b) Global_lock. For data_availability use
a simple onlineoffline model. For global_lock use leader_standby mode.
> 	• data_availability: during offline->online transition sync with current leader
> 	• global_lock: set the ideal state in AUTO(not AUTO_REBALANCE) with one partition
for all repos(in future you can have one per repo to support concurrent updates to multiple
repos). Start with empty preference list.
> 	• On every write request, the node simply adds itself to the end of preference list.
Helix will make the first in the preference list the leader. On the become leader call back,
you send message to all nodes in data_availability resource( you can set self excluded flag=true
to not send message to itself). Once you get the acknowledgement, you update the idealstate
and remove yourself from preferencelist. Old leader will get transition to 
> 
> Couple of things 
> 
> 	• while updating the idealstate, make sure you use the version flag to avoid race
condition while updating.
> 	• After every write you probably should write the git commit number using helixpropertystore,
there is an api that allows you to limit the number of elements you can store. Every time
a node processes a write, it should verify that latest commit version matches local commit
version.
> 
> Note that this solution, even though it works it will be limited by the throughput of
zookeeper since every write is resulting in zookeeper access.And latency wont be great. This
is definitely a good case where you can use in memory grid like infinispan or hazelcast to
achieve that.
> 
> Let me know if it makes sense.
> 
> Thanks,
> Kishore G
> 
> 
> 
> 
> On Tue, May 21, 2013 at 6:19 AM, Alexandre Porcelli <porcelli@redhat.com> wrote:
> Hi Swaroop,
> 
>  Thanks for your input... In fact your description was my initial impression too... in
fact my first PoC was using MasterSlave, but afterwards I got some other strong requirements
like all nodes should be available to execute writes (redirect writes to master isn't much
efficient if compared to a git pull).
>  Other important thing to consider is that my Git cluster is usually (not always once
you have external `clones`) consumed `local` (a web app that cluster is deployed alongside)..
and the web app load balancer isn't something that I can control (my typical scenario is a
mod_cluster with a JBoss application server cluster, and the Git cluster runs from inside
that JBoss cluster).
> 
> Regards,
> ---
> Alexandre Porcelli
> porcelli@redhat.com
> 
> 
> 
> On May 21, 2013, at 4:15 AM, Swaroop Jagadish <sjagadish@linkedin.com> wrote:
> 
> > Hello Alexandre,
> > Based on your description, it looks like MasterSlave state model is best
> > suited for your use case. You distribute the different git repositories
> > evenly across the cluster using "auto rebalance" mode in the state model.
> > A git repo will be mapped to a Helix resource and for a given repo, there
> > is only one node which is the master. Thus, there is only one node which
> > can write to a given repo. The client uses Helix's external view in order
> > to determine which node is the master for a given repo(can be accomplished
> > using the RoutingTableProvider class). In order to keep the repositories
> > in sync, whenever a write happens at the master, the master can send a
> > message to all the slaves to sync their repos. A slave can either reject
> > any direct writes it receives from the client or forward it to the master
> > node.
> >
> > Let me know if that makes sense
> >
> > Regards,
> > Swaroop
> >
> > On 5/20/13 10:36 AM, "Alexandre Porcelli" <porcelli@redhat.com> wrote:
> >
> >> Hi Kishore,
> >>
> >> Lemme try to explain my needs with my real world usage scenario, so
> >> would be easier to you understand.
> >>
> >> In a simple form, what I'm doing is a GIT cluster (using for that jgit
> >> and apache helix). External clients can push data to any node of the
> >> cluster, but in order to be able to have the cluster synced properly (to
> >> avoid conflicts) I need to be sure that just one node is writing at once
> >> (the single global lock role). Just before the current node `unlock` I
> >> notify all other members of the cluster (using Messaging API) that they
> >> must sync (the message points what was the updated repo). The unlock
> >> operation releases the lock (so others that may need update data can do
> >> it).
> >> My current setup to do that uses the "LeaderStandby" model with one
> >> resource (that I name it git-lock resource, with only one partition
> >> git-lock_0), the Leader is the node that holds the lock and the standby
> >> queue is formed by nodes that are willing to update data... nodes that
> >> are not trying to update data, aren't in standby (they're offline due the
> >> partition disabling).
> >> Aside from global lock, when a new node joins the cluster.. it needs
> >> sync all the git repositories - I don't have a fixed list of those repos,
> >> that is why I need to query the cluster asking for a list of existing
> >> repos. This query can be answered by any member of the existing cluster
> >> (once all of them are sync`ed with the global lock).
> >>
> >> Is it clear now?
> >>
> >> What I'm wondering is.. if I'm not trying to mix two different things at
> >> just one (single global lock and cluster's git repository list).
> >>
> >> Maybe it's worth to mention that in a near future I plan to get rid of
> >> the single global lock and have a per git repo lock...
> >>
> >>
> >> Again.. thanks in advance!
> >>
> >> Regards,
> >> ---
> >> Alexandre Porcelli
> >> porcelli@redhat.com
> >>
> >>
> >>
> >>
> >> On May 20, 2013, at 2:05 PM, kishore g <g.kishore@gmail.com> wrote:
> >>
> >>> Hi Alex,
> >>>
> >>> Let me try to formulate your requirements
> >>>
> >>> 1. Have a global lock, of all nodes only one node needs to be LEADER
> >>> 2. When new nodes are added, they automatically become STANDBY and sync
> >>> data with existing LEADER
> >>>
> >>> Both the above requirements can be satisfied with AUTO_REBALANCE mode.
> >>> In your original email, you mentioned about releasing the lock, can you
> >>> explain when do you want to release the lock. Sorry I should have asked
> >>> this earlier. I think this is the requirement that is causing some
> >>> confusion. Also in 0.6.1 we have added a feature where you can plugin
> >>> custom rebalancer logic when the pipeline is run so you can actually
> >>> come up with your custom rebalancing logic. But, its not documented :(
> >>>
> >>> You might be right about using two state models or configure Helix with
> >>> a custom state model. But I want to make sure I understand your use case
> >>> before suggesting that.
> >>>
> >>> thanks,
> >>> Kishore G
> >>>
> >>>
> >>>
> >>> On Mon, May 20, 2013 at 9:17 AM, Alexandre Porcelli
> >>> <porcelli@redhat.com> wrote:
> >>> Hi Kishore,
> >>>
> >>> Once again, thanks for your support... it has been really valuable.
> >>>
> >>> I've been thinking and I'd like to share my thought and ask your (any
> >>> comments are welcomed) opinion about it. My general need (I think I've
> >>> already wrote about it, but here just a small recap) is a single global
> >>> lock to control data changes and, in the same time check the current
> >>> state of a live node in order to be able to sync when a new node joins
> >>> the cluster.
> >>>
> >>> My latest questions about been able to manipulate transitions from API
> >>> was to avoid to have a node in offline mode - as moving away from
> >>> offline is the transition that triggers the sync, and if I disable a
> >>> resource/node I'm redirected to offline automatically (using
> >>> AUTO_REBALANCE). Kishore pointed me how to change my cluster from
> >>> AUTO_REBALANCE to AUTO so I can have control of those transitions....
> >>>
> >>> Now here is what I've been thinking about all of this: seems that I'm
> >>> mixing two different things in just one cluster/resource - one is the
> >>> lock and other is the cluster availability - maybe I'd just need to have
> >>> two different resources for that, one for lock and other for the the
> >>> real data availability - Wdyt? Another thing that would come to my mind
> >>> is that maybe my need doesn't fit to existing state models, and I'd need
> >>> to create a new one with my own config.
> >>>
> >>> I'd like to hear what you think about it... recommendations? thoughts,
> >>> opinions, considerations... anything is welcomed.
> >>>
> >>> Regards,
> >>> ---
> >>> Alexandre Porcelli
> >>> porcelli@redhat.com
> >>>
> >>>
> >>> On May 17, 2013, at 4:40 AM, kishore g <g.kishore@gmail.com> wrote:
> >>>
> >>>> Hi Alexandre,
> >>>>
> >>>> You can get more control in AUTO mode, you are currently using
> >>> AUTO_REBALANCE where Helix decides who should be leader and where should
> >>> it be. If you look at Idealstate it basically looks like this.
> >>>> p1:[]
> >>>>
> >>>> In Auto mode you set the preference list for each partition
> >>>> so you can set something like p1:[n1,n2,n3]
> >>>>
> >>>> In this case if n1 is alive, helix will make n1 the leader n2 n3 will
> >>> be standby. If you want to make some one else leader, say n2 simply
> >>> change this to
> >>>> p1:[n2,n3,n1].
> >>>>
> >>>> Change this line in your code
> >>>> admin.addResource( clusterName, lockGroupName, 1, "LeaderStandby",
> >>> IdealStateModeProperty.AUTO_REBALANCE.toString() );
> >>>>
> >>>> admin.rebalance( clusterName, lockGroupName, numInstances );
> >>>>
> >>>> to
> >>>>
> >>>> admin.addResource( clusterName, lockGroupName, 1, "LeaderStandby",
> >>> IdealStateModeProperty.AUTO.toString() );
> >>>>
> >>>> admin.rebalance( clusterName, lockGroupName, numInstances );
> >>>>
> >>>>
> >>>> //  if you want to change the current leader, you can do the
> >>> following.
> >>>>
> >>>> i
> >>>> dealState = admin.getResourceIdealState(String clusterName, String
> >>> resourceName);
> >>>>
> >>>> List preferenceList; //set the newleader you want as the first entry
> >>>>
> >>>> idealState.getRecord().setListField(partitionName,preferenceList);
> >>>>
> >>>> admin.addResource(String clusterName,String resourceName, IdealState
> >>> idealstate)
> >>>>
> >>>>
> >>>>
> >>>> Read more about the different execution modes
> >>>>
> >>>> http://helix.incubator.apache.org/Concepts.html and
> >>>>
> >>>> http://helix.incubator.apache.org/Features.html
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Kishore G
> >>>>
> >>>>
> >>>>
> >>>> On Thu, May 16, 2013 at 11:09 PM, Alexandre Porcelli
> >>> <porcelli@redhat.com> wrote:
> >>>> Hello all,
> >>>>
> >>>> Sorry to revamp this thread, but I think I'll have to ask again...
> >>> is it possible to force, via an api call, a transition from Leader to
> >>> "Wait" without disable an instance or partition? The transition from
> >>> Leader to Offline triggered  by the disabled partition is causing me
> >>> some troubles...
> >>>> The main problem is that my transition from "Offline" to "Standby"
> >>> syncs data with the rest of the cluster (an expensive task, that should
> >>> be executed only if that node was really offline, in other words: there
> >>> was a partition, the node crashed or whatever).
> >>>>
> >>>> I predict that I may need build my own transition model... not sure
> >>> (not even sure on how to do it and be able to control/expose that
> >>> transition from Leader to "Wait")...
> >>>>
> >>>> Well... any help/suggestion is really welcomed!
> >>>>
> >>>> Cheers,
> >>>> ---
> >>>> Alexandre Porcelli
> >>>> porcelli@redhat.com
> >>>>
> >>>> On May 2, 2013, at 2:26 PM, Alexandre Porcelli <porcelli@redhat.com>
> >>> wrote:
> >>>>
> >>>>> Hi Vinayak,
> >>>>>
> >>>>> You were right, all my mistake! Disabling the partition works like
> >>> a charm! Thank you very much.
> >>>>>
> >>>>> Regards,
> >>>>> ---
> >>>>> Alexandre Porcelli
> >>>>> porcelli@redhat.com
> >>>>>
> >>>>> On May 2, 2013, at 1:22 PM, Vinayak Borkar <vinayakb@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Looking at the signature of HelixAdmin.enablePartition, I see
this:
> >>>>>>
> >>>>>> void enablePartition(boolean enabled,
> >>>>>>                      String clusterName,
> >>>>>>                      String instanceName,
> >>>>>>                      String resourceName,
> >>>>>>                      List<String> partitionNames);
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> So when you disable the partition, you are doing so only on
a
> >>> perticular instance. So my understanding is that the same partition at
> >>> other instances will participate in an election to come out of standby.
> >>>>>>
> >>>>>> Vinayak
> >>>>>>
> >>>>>>
> >>>>>> On 5/2/13 9:14 AM, Alexandre Porcelli wrote:
> >>>>>>> Hi Vinayak,
> >>>>>>>
> >>>>>>> Thanks for your quick answer, but I don't think this would
be
> >>> the case... once the partition `represents` the locked resource, so If i
> >>> disable it no other instance in the cluster will be able to be promoted
> >>> to Leader (at this point other nodes should be in standby just waiting
> >>> to be able to acquire the lock - in other words, become Leader).
> >>>>>>> Anyway thanks for your support.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> ---
> >>>>>>> Alexandre Porcelli
> >>>>>>> porcelli@redhat.com
> >>>>>>>
> >>>>>>>
> >>>>>>> On May 2, 2013, at 1:06 PM, Vinayak Borkar <vinayakb@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> 1. I'm using a LeaderStandby in order to build a
single global
> >>> lock on my cluster, it works as expected.. but in order to release the
> >>> lock I have to put the current leader in standby... I could achieve this
> >>> by disabling the current instance. It works, but doing this I loose (at
> >>> least seems to be) the ability to send/receive user defined messages.
> >>> I'd like to know if it's possible to, via an api call, force a
> >>> transition from Leader to Standby without disable an instance.
> >>>>>>>>
> >>>>>>>> I am a newbie to Helix too and I had a similar question
a few
> >>> days ago. Have you looked into disabling the resource by using the
> >>> disablePartition() call in HelixAdmin using a partition number of 0?
> >>> This should disable just the resource without impacting the instance.
> >>>>>>>>
> >>>>>>>> Vinayak
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 2. I've been taking a quick look on Helix codebase,
more
> >>> specific on ZooKeeper usage. Seems that you're using ZooKeeper as a
> >>> default implementation, but Helix architecture is not tied to it, right?
> >>> I'm asking this, because I'm interested to implement (in a near future)
> >>> a different backend (Infinispan).
> >>>>>>>>>
> >>>>>>>>> That's it for now...  thanks in advance.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> ---
> >>>>>>>>> Alexandre Porcelli
> >>>>>>>>> porcelli@redhat.com
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >
> 
> 

Mime
View raw message