helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tejeswar Das <tejes...@gmail.com>
Subject Issues with CustomMasterSlave model (cluster not converging on ideal state)
Date Tue, 27 Jun 2017 18:35:48 GMT
Hi,

I am trying to build a custom MasterSlave model. It differs from Helix’s out-of-the-box
MasterSlave model in the following ways:

In addition to MASTER, there is a SYNCREPLICA state and ASYNCREPLICA state.

For each partition, there will be one MASTER, and a small (R) number of sync-replicas. Master
will consider a message to be committed only if it is synchronously replicated to all sync-replicas.
(push-model)

The async-replicas are run on the remaining nodes, and they will be asynchronously replicating
from master (pull-model).

The upper-bound for MASTER is 1.
The dynamic upper-bound for SYNCREPLICA is “R”  (replica count)
The dynamic upper-bound for ASYNCREPLICA is “N”  (replica count)

Here is a snippet of my code: (I am using Helix 0.7.1 and following this documentation: http://helix.apache.org/0.7.1-docs/tutorial_state.html
<http://helix.apache.org/0.7.1-docs/tutorial_state.html>)

        // upper bounds
        builder.upperBound(States.MASTER.name(), 1);
        builder.dynamicUpperBound(States.SYNCREPLICA.name(), "R");
        builder.dynamicUpperBound(States.ASYNCREPLICA.name(), "N");

With the above code, I am getting the following exception, during the initial configuration
of the cluster.

Exception in thread "main" org.apache.helix.HelixException: Invalid or unsupported state model
definition
	at org.apache.helix.manager.zk.ZKHelixAdmin.rebalance(ZKHelixAdmin.java:895)
	at org.apache.helix.manager.zk.ZKHelixAdmin.rebalance(ZKHelixAdmin.java:844)
	at org.apache.helix.manager.zk.ZKHelixAdmin.rebalance(ZKHelixAdmin.java:824)

I saw the same exception with following code too:

        // upper bounds
        builder.upperBound(States.MASTER.name(), 1);
        builder.upperBound(States.SYNCREPLICA.name(), 3);
        builder.dynamicUpperBound(States.ASYNCREPLICA.name(), "N");

Next, I tried something different. Instead of providing upper-bound to two states, I provided
upper-bound to only one state. Here is the code snippet:

        // upper bounds
        builder.upperBound(States.MASTER.name(), 1);
        builder.upperBound(States.SYNCREPLICA.name(), 3);
        builder.dynamicUpperBound(States.ASYNCREPLICA.name(), "R");

With the above code snippet, the initial configuration/rebalance completed successfully. (3
partitions and 2 replicas)

However, when I bring up 5 nodes, I see that few nodes are spinning/toggling between two states
for a while, and then settle in one state. But the cluster does not seem to be reaching the
ideal state.

One node toggled between MASTER<->SYNCREPLICA
Another node toggled between SYNCREPLICA<->ASYNCREPLICA, and finally went to OFFLINE—>DROPPED
Third node toggled between OFFLINE<->ASYNCREPLICA, before finally settled into ASYNCREPLICA—>SYNCREPLICA—>MASTER

After all the 5 nodes settled down, I still did not see any of the nodes in ASYNCREPLICA state.

Here is the ExternalView: 

{
  "id" : "msshard",
  "mapFields" : {
    "msshard_0" : {
      "MS_instance0" : "MASTER",
      "MS_instance2" : "SYNCREPLICA"
    },
    "msshard_1" : {
      "MS_instance1" : "MASTER",
      "MS_instance3" : "SYNCREPLICA"
    },
    "msshard_2" : {
      "MS_instance0" : "SYNCREPLICA",
      "MS_instance4" : "MASTER"
    }
  },
  "listFields" : {
  },
  "simpleFields" : {
    "BUCKET_SIZE" : "0"
  }
}

I would really appreciate help on this. Here is the complete code for building state. I am
wondering if there is any issue in the transitions below that is causing the problem:


    public static enum States {
        MASTER, SYNCREPLICA, ASYNCREPLICA, OFFLINE
    }

    public static StateModelDefinition buildCustomMasterSlaveModel() {
        StateModelDefinition.Builder builder = new StateModelDefinition.Builder("CustomMasterSlave");
        builder.initialState(States.OFFLINE.name());

        // add states
        builder.addState(States.MASTER.name(), 0);
        builder.addState(States.SYNCREPLICA.name(), 1);
        builder.addState(States.ASYNCREPLICA.name(), 2);
        builder.addState(States.OFFLINE.name(), 3);

        for (HelixDefinedState state : HelixDefinedState.values()) {
            builder.addState(state.name());
        }

        // add transitions
        builder.addTransition(States.SYNCREPLICA.name(), States.MASTER.name(), 1);
        builder.addTransition(States.ASYNCREPLICA.name(), States.SYNCREPLICA.name(), 2);
        builder.addTransition(States.OFFLINE.name(), States.ASYNCREPLICA.name(), 3);
        builder.addTransition(States.MASTER.name(), States.SYNCREPLICA.name(), 4);
        builder.addTransition(States.SYNCREPLICA.name(), States.ASYNCREPLICA.name(), 4);
        builder.addTransition(States.ASYNCREPLICA.name(), States.OFFLINE.name(), 4);
        builder.addTransition(States.OFFLINE.name(), HelixDefinedState.DROPPED.name());

        // upper bounds
        builder.upperBound(States.MASTER.name(), 1);
        builder.upperBound(States.SYNCREPLICA.name(), 3);
        builder.dynamicUpperBound(States.ASYNCREPLICA.name(), "R");

        return builder.build();
    }

Thanks and regards
Tej



Mime
View raw message