helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhen Zhang <nehzgn...@gmail.com>
Subject Re: Prevent failback to MASTER after failover
Date Wed, 08 May 2013 23:18:28 GMT
Hi Ming, I've added a test case for this, see TestMessageThrottle2.java. It
is just a copy of your example with minor changes.

https://github.com/apache/incubator-helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/TestMessageThrottle2.java


At step 3) when you are adding Node-1, there are three state transition
messages need to be sent:
T1) Offline->Slave for Node-1
T2) Master->Slave for Node-2
T3) Slave->Master for Node-1

Note that T1 and T2 can be sent together. If you are using instance level
constraint like this:
   // limit one transition message at a time for each instance
    builder.addConstraintAttribute("MESSAGE_TYPE", "STATE_TRANSITION")
    .addConstraintAttribute("INSTANCE", ".*")
                   .addConstraintAttribute("CONSTRAINT_VALUE", "1");

Then T1 and T2 will be sent together in the first round since T1 and T2 are
sent to two different nodes. And T3 will be sent in the next round.

If you are specifying a cluster level constraint like this:
    // limit one transition message at a time for the entire cluster
    builder.addConstraintAttribute("MESSAGE_TYPE", "STATE_TRANSITION")
              .addConstraintAttribute("CONSTRAINT_VALUE", "1");

Then helix controller will send T1 in the first round; then send T2; then
T3. The reason why T1 is sent before T2 is because in the state model
definition, you specified that Offline->Slave transition has a higher
priority than Master->Slave.

The test runs without problem. Here is the output:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start zookeeper at localhost:2183 in thread main
START TestMessageThrottle2 at Wed May 08 15:57:21 PDT 2013
Creating cluster: TestMessageThrottle2
Starting Controller{Cluster:TestMessageThrottle2, Port:12000,
Zookeeper:localhost:2183}
StatusPrinter.onIdealStateChange:state = MyResource,
{IDEAL_STATE_MODE=AUTO, NUM_PARTITIONS=1, REPLICAS=2,
STATE_MODEL_DEF_REF=MasterSlave}{}{MyResource=[node1, node2]}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{}{}
StatusPrinter.onControllerChange:org.apache.helix.NotificationContext@6e3404f
StatusPrinter.onInstanceConfigChange:instanceConfig = node2,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onLiveInstanceChange:liveInstance = node2,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60006}{}{}
StatusPrinter.onIdealStateChange:state = MyResource,
{IDEAL_STATE_MODE=AUTO, NUM_PARTITIONS=1, REPLICAS=2,
STATE_MODEL_DEF_REF=MasterSlave}{}{MyResource=[node1, node2]}
StatusPrinter.onInstanceConfigChange:instanceConfig = node2,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{}{}
StatusPrinter.onLiveInstanceChange:liveInstance = node2,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60006}{}{}
StatusPrinter.onControllerChange:org.apache.helix.NotificationContext@76d3046
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node2=MASTER}}{}
StatusPrinter.onInstanceConfigChange:instanceConfig = node1,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onInstanceConfigChange:instanceConfig = node2,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node2=MASTER}}{}
StatusPrinter.onInstanceConfigChange:instanceConfig = node1,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onInstanceConfigChange:instanceConfig = node2,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onLiveInstanceChange:liveInstance = node1,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60008}{}{}
StatusPrinter.onLiveInstanceChange:liveInstance = node2,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60006}{}{}
StatusPrinter.onIdealStateChange:state = MyResource,
{IDEAL_STATE_MODE=AUTO, NUM_PARTITIONS=1, REPLICAS=2,
STATE_MODEL_DEF_REF=MasterSlave}{}{MyResource=[node1, node2]}
StatusPrinter.onInstanceConfigChange:instanceConfig = node1,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onInstanceConfigChange:instanceConfig = node2,
{HELIX_ENABLED=true, HELIX_HOST=localhost}{}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node2=MASTER}}{}
StatusPrinter.onLiveInstanceChange:liveInstance = node1,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60008}{}{}
StatusPrinter.onLiveInstanceChange:liveInstance = node2,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60006}{}{}
StatusPrinter.onControllerChange:org.apache.helix.NotificationContext@b9deddb
StatusPrinter.onLiveInstanceChange:liveInstance = node1,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60008}{}{}
StatusPrinter.onLiveInstanceChange:liveInstance = node2,
{HELIX_VERSION=${project.version}, LIVE_INSTANCE=11881@zzhang-mn1,
SESSION_ID=13e865cfca60006}{}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node1=SLAVE, node2=MASTER}}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node1=SLAVE, node2=MASTER}}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node1=MASTER, node2=SLAVE}}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node1=MASTER, node2=SLAVE}}{}
StatusPrinter.onExternalViewChange:externalView = MyResource,
{BUCKET_SIZE=0}{MyResource={node1=MASTER, node2=SLAVE}}{}
true: wait 489ms,
ClusterStateVerifier$BestPossAndExtViewZkVerifier(TestMessageThrottle2@localhost
:2183)
END TestMessageThrottle2 at Wed May 08 15:57:30 PDT 2013

Thanks,
Jason




On Tue, May 7, 2013 at 8:25 PM, Ming Fang <mingfang@mac.com> wrote:

> Here is the code that I'm using to test
> https://github.com/mingfang/apache-helix/tree/master/helix-example
>
> In ZAC.java line 134 is where I'm adding the constraint.
> Line 204 is where I'm setting the state transition priority list.
>
> The steps I'm using is
> 1-Run ZAC and wait for the StatusPrinter printouts
> 2-Run Node2 and wait for it to transition to MASTER
> 3-Run Node1
> At this point we see the problem where the external view will say
> node1=SLAVE and node2=SLAVE.
>
> I can get the MessageThrottleStage to work by replacing line 205 with this
>           String key=item.toString();
> But even with message throttle working I can can't get the transition
> sequence I need.
>
>
> On May 7, 2013, at 11:43 AM, kishore g <g.kishore@gmail.com> wrote:
>
> Can you give provide the code snippet you used to add the constraint.
> Looks like you are setting constraint at INSTANCE level.
>
>
>
>
> On Mon, May 6, 2013 at 9:52 PM, Ming Fang <mingfang@mac.com> wrote:
>
>> I almost have this working.
>> However I'm experiencing a potential bug in MessageThrottleStage line 205.
>> The problem is that the throttleMap's key contains the INSTANCE=<id> in
>> it.
>> This effectively makes trying to throttle across the entire cluster
>> impossible.
>>
>> On Apr 24, 2013, at 2:07 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>>
>> > Hi Ming, to set the constraint so that only one transition message at a
>> > time, you can take a look at the test example of TestMessageThrottle.
>> You
>> > need to add a message constraint as follows:
>> >
>> > // build a message constraint
>> > ConstraintItemBuilder builder = new ConstraintItemBuilder();
>> > builder.addConstraintAttribute("MESSAGE_TYPE", "STATE_TRANSITION")
>> >   .addConstraintAttribute("INSTANCE", ".*")
>> >   .addConstraintAttribute("CONSTRAINT_VALUE", "1");
>> >
>> > // add the constraint to the cluster
>> > helixAdmin.setConstraint(clusterName, ConstraintType.MESSAGE_CONSTRAINT,
>> > "constraint1", builder.build());
>> >
>> >
>> > Message constraint is separate from ideal state and is not specified in
>> > the JSON file of the ideal state.
>> >
>> > Thanks,
>> > Jason
>> >
>> >
>> >
>> >
>> > On 4/23/13 2:40 PM, "Ming Fang" <mingfang@mac.com> wrote:
>> >
>> >> Kishore
>> >>
>> >> It sounds like the solution is to set the constraints so that only one
>> >> transition at a time.
>> >> Can you point me to an example of how to do this?
>> >> Also is this something I can set in the JSON file?
>> >>
>> >> Sent from my iPad
>> >>
>> >> On Apr 1, 2013, at 11:32 AM, kishore g <g.kishore@gmail.com> wrote:
>> >>
>> >>> Hi Ming,
>> >>>
>> >>> Thanks for the detailed explanation. Actually 5 & 6  happen in
>> >>> parallel, Helix tries to parallelize the transitions as much as
>> possible.
>> >>>
>> >>> There is another feature in Helix that allows you to sort the
>> >>> transitions based on some priority.See STATE_TRANSITION_PRIORITY_LIST
>> in
>> >>> state model definition. But after sorting Helix will send as many as
>> >>> possible in parallel without violating constraints.
>> >>>
>> >>> In your case you want the priority to be S-M, O-S, M-S but that is not
>> >>> sufficient since O-S and M-S will be sent in parallel.
>> >>>
>> >>> Additionally, what you need to do is set contraint on transition that
>> >>> there should be only one transition per partition at any time. This
>> will
>> >>> basically make the order 6 5 7 and they will be executed sequentially
>> >>> per partition.
>> >>>
>> >>> We will try this  out and let you know, you dont need to change any
>> >>> code in Helix or your app. You should be able to tweak the
>> configuration
>> >>> dynamically.
>> >>>
>> >>> We will try to think of solving this in a more elegant way. I will
>> file
>> >>> a jira and add more info.
>> >>>
>> >>> I also want to ask this question, when a node comes up if it is
>> >>> mandatory to talk to MASTER what happens when the nodes are started
>> for
>> >>> the first time or when all nodes crash and come back.
>> >>>
>> >>> thanks,
>> >>> Kishore G
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >
>>
>>
>
>

Mime
View raw message