helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Fang <mingf...@mac.com>
Subject Re: Prevent failback to MASTER after failover
Date Thu, 09 May 2013 16:33:30 GMT
Thanks for adding the test case. 
So looks like I just have to remove the INSTANCE constraint. 


Sent from my iPad

On May 8, 2013, at 7:18 PM, Zhen Zhang <nehzgnahz@gmail.com> wrote:

> Hi Ming, I've added a test case for this, see TestMessageThrottle2.java. It is just a
copy of your example with minor changes.
> 
> https://github.com/apache/incubator-helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/TestMessageThrottle2.java
> 
> 
> At step 3) when you are adding Node-1, there are three state transition messages need
to be sent:
> T1) Offline->Slave for Node-1
> T2) Master->Slave for Node-2
> T3) Slave->Master for Node-1
> 
> Note that T1 and T2 can be sent together. If you are using instance level constraint
like this:
>    // limit one transition message at a time for each instance
>     builder.addConstraintAttribute("MESSAGE_TYPE", "STATE_TRANSITION")
>  		   .addConstraintAttribute("INSTANCE", ".*")
>                    .addConstraintAttribute("CONSTRAINT_VALUE", "1");
> 
> Then T1 and T2 will be sent together in the first round since T1 and T2 are sent to two
different nodes. And T3 will be sent in the next round.
> 
> If you are specifying a cluster level constraint like this:
>     // limit one transition message at a time for the entire cluster
>     builder.addConstraintAttribute("MESSAGE_TYPE", "STATE_TRANSITION")
>               .addConstraintAttribute("CONSTRAINT_VALUE", "1");
> 
> Then helix controller will send T1 in the first round; then send T2; then T3. The reason
why T1 is sent before T2 is because in the state model definition, you specified that Offline->Slave
transition has a higher priority than Master->Slave.
> 
> The test runs without problem. Here is the output:
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Start zookeeper at localhost:2183 in thread main
> START TestMessageThrottle2 at Wed May 08 15:57:21 PDT 2013
> Creating cluster: TestMessageThrottle2
> Starting Controller{Cluster:TestMessageThrottle2, Port:12000, Zookeeper:localhost:2183}
> StatusPrinter.onIdealStateChange:state = MyResource, {IDEAL_STATE_MODE=AUTO, NUM_PARTITIONS=1,
REPLICAS=2, STATE_MODEL_DEF_REF=MasterSlave}{}{MyResource=[node1, node2]}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{}{}
> StatusPrinter.onControllerChange:org.apache.helix.NotificationContext@6e3404f
> StatusPrinter.onInstanceConfigChange:instanceConfig = node2, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onLiveInstanceChange:liveInstance = node2, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60006}{}{}
> StatusPrinter.onIdealStateChange:state = MyResource, {IDEAL_STATE_MODE=AUTO, NUM_PARTITIONS=1,
REPLICAS=2, STATE_MODEL_DEF_REF=MasterSlave}{}{MyResource=[node1, node2]}
> StatusPrinter.onInstanceConfigChange:instanceConfig = node2, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{}{}
> StatusPrinter.onLiveInstanceChange:liveInstance = node2, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60006}{}{}
> StatusPrinter.onControllerChange:org.apache.helix.NotificationContext@76d3046
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node2=MASTER}}{}
> StatusPrinter.onInstanceConfigChange:instanceConfig = node1, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onInstanceConfigChange:instanceConfig = node2, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node2=MASTER}}{}
> StatusPrinter.onInstanceConfigChange:instanceConfig = node1, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onInstanceConfigChange:instanceConfig = node2, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onLiveInstanceChange:liveInstance = node1, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60008}{}{}
> StatusPrinter.onLiveInstanceChange:liveInstance = node2, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60006}{}{}
> StatusPrinter.onIdealStateChange:state = MyResource, {IDEAL_STATE_MODE=AUTO, NUM_PARTITIONS=1,
REPLICAS=2, STATE_MODEL_DEF_REF=MasterSlave}{}{MyResource=[node1, node2]}
> StatusPrinter.onInstanceConfigChange:instanceConfig = node1, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onInstanceConfigChange:instanceConfig = node2, {HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node2=MASTER}}{}
> StatusPrinter.onLiveInstanceChange:liveInstance = node1, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60008}{}{}
> StatusPrinter.onLiveInstanceChange:liveInstance = node2, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60006}{}{}
> StatusPrinter.onControllerChange:org.apache.helix.NotificationContext@b9deddb
> StatusPrinter.onLiveInstanceChange:liveInstance = node1, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60008}{}{}
> StatusPrinter.onLiveInstanceChange:liveInstance = node2, {HELIX_VERSION=${project.version},
LIVE_INSTANCE=11881@zzhang-mn1, SESSION_ID=13e865cfca60006}{}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node1=SLAVE,
node2=MASTER}}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node1=SLAVE,
node2=MASTER}}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node1=MASTER,
node2=SLAVE}}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node1=MASTER,
node2=SLAVE}}{}
> StatusPrinter.onExternalViewChange:externalView = MyResource, {BUCKET_SIZE=0}{MyResource={node1=MASTER,
node2=SLAVE}}{}
> true: wait 489ms, ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestMessageThrottle2@localhost:2183)
> END TestMessageThrottle2 at Wed May 08 15:57:30 PDT 2013
> 
> Thanks,
> Jason
> 
> 
> 
> 
> On Tue, May 7, 2013 at 8:25 PM, Ming Fang <mingfang@mac.com> wrote:
>> Here is the code that I'm using to test
>> https://github.com/mingfang/apache-helix/tree/master/helix-example
>> 
>> In ZAC.java line 134 is where I'm adding the constraint.
>> Line 204 is where I'm setting the state transition priority list.
>> 
>> The steps I'm using is
>> 1-Run ZAC and wait for the StatusPrinter printouts
>> 2-Run Node2 and wait for it to transition to MASTER
>> 3-Run Node1
>> At this point we see the problem where the external view will say node1=SLAVE and
node2=SLAVE.
>> 
>> I can get the MessageThrottleStage to work by replacing line 205 with this
>>           String key=item.toString();
>> But even with message throttle working I can can't get the transition sequence I
need.
>> 
>> 
>> On May 7, 2013, at 11:43 AM, kishore g <g.kishore@gmail.com> wrote:
>> 
>>> Can you give provide the code snippet you used to add the constraint. Looks like
you are setting constraint at INSTANCE level.
>>> 
>>> 
>>> 
>>> 
>>> On Mon, May 6, 2013 at 9:52 PM, Ming Fang <mingfang@mac.com> wrote:
>>>> I almost have this working.
>>>> However I'm experiencing a potential bug in MessageThrottleStage line 205.
>>>> The problem is that the throttleMap's key contains the INSTANCE=<id>
in it.
>>>> This effectively makes trying to throttle across the entire cluster impossible.
>>>> 
>>>> On Apr 24, 2013, at 2:07 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>>>> 
>>>> > Hi Ming, to set the constraint so that only one transition message at
a
>>>> > time, you can take a look at the test example of TestMessageThrottle.
You
>>>> > need to add a message constraint as follows:
>>>> >
>>>> > // build a message constraint
>>>> > ConstraintItemBuilder builder = new ConstraintItemBuilder();
>>>> > builder.addConstraintAttribute("MESSAGE_TYPE", "STATE_TRANSITION")
>>>> >   .addConstraintAttribute("INSTANCE", ".*")
>>>> >   .addConstraintAttribute("CONSTRAINT_VALUE", "1");
>>>> >
>>>> > // add the constraint to the cluster
>>>> > helixAdmin.setConstraint(clusterName, ConstraintType.MESSAGE_CONSTRAINT,
>>>> > "constraint1", builder.build());
>>>> >
>>>> >
>>>> > Message constraint is separate from ideal state and is not specified
in
>>>> > the JSON file of the ideal state.
>>>> >
>>>> > Thanks,
>>>> > Jason
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On 4/23/13 2:40 PM, "Ming Fang" <mingfang@mac.com> wrote:
>>>> >
>>>> >> Kishore
>>>> >>
>>>> >> It sounds like the solution is to set the constraints so that only
one
>>>> >> transition at a time.
>>>> >> Can you point me to an example of how to do this?
>>>> >> Also is this something I can set in the JSON file?
>>>> >>
>>>> >> Sent from my iPad
>>>> >>
>>>> >> On Apr 1, 2013, at 11:32 AM, kishore g <g.kishore@gmail.com>
wrote:
>>>> >>
>>>> >>> Hi Ming,
>>>> >>>
>>>> >>> Thanks for the detailed explanation. Actually 5 & 6  happen
in
>>>> >>> parallel, Helix tries to parallelize the transitions as much
as possible.
>>>> >>>
>>>> >>> There is another feature in Helix that allows you to sort the
>>>> >>> transitions based on some priority.See STATE_TRANSITION_PRIORITY_LIST
in
>>>> >>> state model definition. But after sorting Helix will send as
many as
>>>> >>> possible in parallel without violating constraints.
>>>> >>>
>>>> >>> In your case you want the priority to be S-M, O-S, M-S but that
is not
>>>> >>> sufficient since O-S and M-S will be sent in parallel.
>>>> >>>
>>>> >>> Additionally, what you need to do is set contraint on transition
that
>>>> >>> there should be only one transition per partition at any time.
This will
>>>> >>> basically make the order 6 5 7 and they will be executed sequentially
>>>> >>> per partition.
>>>> >>>
>>>> >>> We will try this  out and let you know, you dont need to change
any
>>>> >>> code in Helix or your app. You should be able to tweak the configuration
>>>> >>> dynamically.
>>>> >>>
>>>> >>> We will try to think of solving this in a more elegant way.
I will file
>>>> >>> a jira and add more info.
>>>> >>>
>>>> >>> I also want to ask this question, when a node comes up if it
is
>>>> >>> mandatory to talk to MASTER what happens when the nodes are
started for
>>>> >>> the first time or when all nodes crash and come back.
>>>> >>>
>>>> >>> thanks,
>>>> >>> Kishore G
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >
>>>> 
>>> 
>> 
> 

Mime
View raw message