helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiajun Wang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HELIX-659) Extend Helix to Support Resource with Multiple States
Date Mon, 10 Jul 2017 19:26:02 GMT

    [ https://issues.apache.org/jira/browse/HELIX-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080926#comment-16080926
] 

Jiajun Wang edited comment on HELIX-659 at 7/10/17 7:25 PM:
------------------------------------------------------------

h2. Design Details

h3. Register Secondary States Model / Factory

Note that if a secondary state model is a dynamic state, defaultTransitionHandler has to be
implemented.

*State Model Factory*

public abstract class DynamicStateModelFactory extends StateModelFactory<DynamicStateModel>
{
  ...
}
  
public abstract class DynamicStateModel extends StateModel {
  static final String DEFAULT_INITIAL_STATE = "UNKNOWN";
  protected String _currentState = DEFAULT_INITIAL_STATE;
 
  public String getCurrentState() {
    return _currentState;
  }
 
  // !!!!!!!!!!! Changed part !!!!!!!!!!!! //
  @transition(from='from', to='to')
  public void defaultTransitionHandler(Message message, NotificationContext context) {
    logger
      .error("Default transition handler. The idea is to invoke this if no transition method
is found. To be implemented");
  }
 
  public boolean updateState(String newState) {
    _currentState = newState;
    return true;
  }
 
  public void rollbackOnError(Message message, NotificationContext context,
      StateTransitionError error) {
    logger.error("Default rollback method invoked on error. Error Code: " + error.getCode());
  }
 
  public void reset() {
    logger
      .warn("Default reset method invoked. Either because the process longer own this resource
or session timedout");
  }
 
  // !!!!!!!!!! Internal State such as ERROR will still exist and supported !!!!!!!!!! //
  @Transition(to = "DROPPED", from = "ERROR")
  public void onBecomeDroppedFromError(Message message, NotificationContext context)
      throws Exception {
    logger.info("Default ERROR->DROPPED transition invoked.");
  }
}

h2. Resource Configuration

Secondary states are conceptually map values.
Besides the state itself, each state model may have different factory name as well. So there
will be <StateModel, Factory> and <StateModel, State>.

We keep the design that, 1. state configurations are at the partition level. 2. state factory
configurations are at the resource level.

In order to allow multiple states to be configured, we propose to represent it in JSON string
format. Note that the state model name is used as the key, so no duplicate model can be used
in one partition.

*Resource config with secondary state VERSION*

{
  "id":"Test_Resource"
  ,"simpleFields":{
    "SECONDARY_STATE_MODEL_DEF" : "{VERSION : VersionStateModelFactory}"
  }
  ,"mapFields":{
    "partition_1" : "{VERSION : 1.0.1}"
    ,"partition_2" : "{VERSION : 1.0.2}"
  }
}

*Additional APIs to configure secondary states*

 /**
 * Set configuration values
 * @param scope
 * @param properties
 */
void setConfig(HelixConfigScope scope, Map<String, List<String>> listProperties);
  
/**
 * Get configuration values
 * @param scope
 * @param keys
 * @return configuration values ordered by the provided keys
 */
Map<String, List<String>> getConfig(HelixConfigScope scope, List<String>
keys);

h3. Partitions with the Secondary States shown in Current State and External View

Current state shows both the secondary state models and states in the same format with resource
configuration.

*Current States*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF":"MasterSlave"
    ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
    ,"BUCKET_SIZE":"0"
    ,"SESSION_ID":"25b2ce5dfbde0fa"
    ,"SECONDARY_STATE_MODEL_DEF" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "partition_1":{
      "CURRENT_STATE":"MASTER"
      ,"SECONDARY_STATES":"{VERSION : 1.0.1}"
      ,"INFO":""
    }
    ,"partition_2":{
      "CURRENT_STATE":"SLAVE"
      ,"SECONDARY_STATES":"{VERSION : 1.0.1}"
      ,"INFO":""
    }
  }
}

As for the external view, we have 2 options to show secondary states.
1. Compressing all states by combining the main state with secondary states. The states are
separated by ":".

*Secondary state in External View*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF_REF":"MasterSlave"
    ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "example_resource_0":{
      "app0004.stg.com_11900":"{MasterSlave : MASTER} : {VERSION : 1.0.1}"
      ,"app0048.stg.com_11900":"{MasterSlave : SLAVE} : {VERSION : 1.0.0}"
    }
  }
}

2. Adding new fields for showing secondary states separately.

*Secondary state in External View*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF_REF":"MasterSlave"
    ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "example_resource_0":{
      "app0004.stg.com_11900":"MASTER"
      ,"app0048.stg.com_11900":"SLAVE"
      ,"app0048.stg.com_11900_SECONDARY_STATE":"{VERSION : 1.0.0}"
      ,"app0048.stg.com_11900_SECONDARY_STATE":"{VERSION : 1.0.0}"
    }
  }
}

Actually, both options have backward compatible issues. The first design will change state
string, so the legacy client won't be able to interpret. The second design will increase map
fields items. So the applications that read this map for all partitions will find additional
partitions. And the names are incorrect.
Comparing these 2 options, the first one fit our long turn goals much better. So it is our
choice for phase one.
As for the backward compatible issue, we plan to create an additional external view ZK node
for holding new format. And the old external view node will be kept the same.

h3. State Transition Message

On multiple states change, the messages are sent in order according to priority. There won't
be parallel state transition on one partition.

h3. Helix Controller Updates

When resource configuration is changed:

* Fill ClusterDataCache with secondary states and state models/factories.
* Compare for status delta and compose messages accordingly. Order messages according to state
model priority.
* Send the highest priority message to the participant.

One optimization opportunity is allowing parallel state transition messages if there is no
conflict.

When participant current state is changed:

* Read secondary states and fill new external view ZK node with encoded complete status information.

h3. Helix Participant Updates

On receiving state transition message:

* Check if the message is a registered state model. Trigger state transition.
*   If any state transition failed, set an error state and stop processing. The user should
fix the problem and reset to initial state.
*   If state transition succeeds, update the current state.

h2. Alternative Options for Supporting Additional States

h3. Introducing special state for additional status change

Adding a new internal state UPGRADING (or other special states) for status change.
So any additional status change will happen when a partition is transited "to" or "from" UPGRADING
state.
Note that application has the freedom to define whether UPGRADING is a special online status
or not.This is for decoupling the main state from additional "states".
For Pinot case, upgrading partition (even before they are back to ONLINE) might be active
partition.

The problem of this new state is that it only works fine for a single additional state model.
Once we have more than one state models to take care, and they are changed separately, UPGRADING
state is not enough.

h3. Rely on resetting partition to load new "states"

Whenever new states are going to be set, application updates resource configuration. Then
resetting all partitions.
Then during state transition from offline to online, participants will read new states from
the configuration and apply to the related partitions.

The problem is that changing in additional states will affect the main state. The partition
will be offline for a while.

h3. Application registers additional message handler for customized transition message

In this method, application owns the logic. Helix just dispatches customized state transition
message to trigger the operation. In the message handler, the application read and write the
information of the additional state to the property store.

Consider additional states is a generic requirement, letting multiple applications to implement
similar logic separately does not make sense.


was (Author: jiajunwang):
h2. Design Details

h3. Register Secondary States Model / Factory

Note that if a secondary state model is a dynamic state, defaultTransitionHandler has to be
implemented.

*State Model Factory*

public abstract class DynamicStateModelFactory extends StateModelFactory<DynamicStateModel>
{
  ...
}
  
public abstract class DynamicStateModel extends StateModel {
  static final String DEFAULT_INITIAL_STATE = "UNKNOWN";
  protected String _currentState = DEFAULT_INITIAL_STATE;
 
  public String getCurrentState() {
    return _currentState;
  }
 
  // !!!!!!!!!!! Changed part !!!!!!!!!!!! //
  @transition(from='from', to='to')
  public void defaultTransitionHandler(Message message, NotificationContext context) {
    logger
      .error("Default transition handler. The idea is to invoke this if no transition method
is found. To be implemented");
  }
 
  public boolean updateState(String newState) {
    _currentState = newState;
    return true;
  }
 
  public void rollbackOnError(Message message, NotificationContext context,
      StateTransitionError error) {
    logger.error("Default rollback method invoked on error. Error Code: " + error.getCode());
  }
 
  public void reset() {
    logger
      .warn("Default reset method invoked. Either because the process longer own this resource
or session timedout");
  }
 
  // !!!!!!!!!! Internal State such as ERROR will still exist and supported !!!!!!!!!! //
  @Transition(to = "DROPPED", from = "ERROR")
  public void onBecomeDroppedFromError(Message message, NotificationContext context)
      throws Exception {
    logger.info("Default ERROR->DROPPED transition invoked.");
  }
}

h2. Resource Configuration

Secondary states are conceptually map values.
Besides the state itself, each state model may have different factory name as well. So there
will be <StateModel, Factory> and <StateModel, State>.

We keep the design that, 1. state configurations are at the partition level. 2. state factory
configurations are at the resource level.

In order to allow multiple states to be configured, we propose to represent it in JSON string
format. Note that the state model name is used as the key, so no duplicate model can be used
in one partition.

*Resource config with secondary state VERSION*

{
  "id":"Test_Resource"
  ,"simpleFields":{
    "SECONDARY_STATE_MODEL_DEF" : "{VERSION : VersionStateModelFactory}"
  }
  ,"mapFields":{
    "partition_1" : "{VERSION : 1.0.1}"
    ,"partition_2" : "{VERSION : 1.0.2}"
  }
}

*Additional APIs to configure secondary states*

 /**
 * Set configuration values
 * @param scope
 * @param properties
 */
void setConfig(HelixConfigScope scope, Map<String, List<String>> listProperties);
  
/**
 * Get configuration values
 * @param scope
 * @param keys
 * @return configuration values ordered by the provided keys
 */
Map<String, List<String>> getConfig(HelixConfigScope scope, List<String>
keys);

h3. Partitions with the Secondary States shown in Current State and External View

Current state shows both the secondary state models and states in the same format with resource
configuration.

*Current States*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF":"MasterSlave"
    ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
    ,"BUCKET_SIZE":"0"
    ,"SESSION_ID":"25b2ce5dfbde0fa"
    ,"SECONDARY_STATE_MODEL_DEF" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "partition_1":{
      "CURRENT_STATE":"MASTER"
      ,"SECONDARY_STATES":"{VERSION : 1.0.1}"
      ,"INFO":""
    }
    ,"partition_2":{
      "CURRENT_STATE":"SLAVE"
      ,"SECONDARY_STATES":"{VERSION : 1.0.1}"
      ,"INFO":""
    }
  }
}

As for the external view, we have 2 options to show secondary states.
1. Compressing all states by combining the main state with secondary states. The states are
separated by ":".

*Secondary state in External View*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF_REF":"MasterSlave"
    ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "example_resource_0":{
      "lca1-app0004.stg.linkedin.com_11932":"{MasterSlave : MASTER} : {VERSION : 1.0.1}"
      ,"lca1-app0048.stg.linkedin.com_11932":"{MasterSlave : SLAVE} : {VERSION : 1.0.0}"
    }
  }
}

2. Adding new fields for showing secondary states separately.

*Secondary state in External View*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF_REF":"MasterSlave"
    ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "example_resource_0":{
      "lca1-app0004.stg.linkedin.com_11932":"MASTER"
      ,"lca1-app0048.stg.linkedin.com_11932":"SLAVE"
      ,"lca1-app0048.stg.linkedin.com_11932_SECONDARY_STATE":"{VERSION : 1.0.0}"
      ,"lca1-app0048.stg.linkedin.com_11932_SECONDARY_STATE":"{VERSION : 1.0.0}"
    }
  }
}

Actually, both options have backward compatible issues. The first design will change state
string, so the legacy client won't be able to interpret. The second design will increase map
fields items. So the applications that read this map for all partitions will find additional
partitions. And the names are incorrect.
Comparing these 2 options, the first one fit our long turn goals much better. So it is our
choice for phase one.
As for the backward compatible issue, we plan to create an additional external view ZK node
for holding new format. And the old external view node will be kept the same.

h3. State Transition Message

On multiple states change, the messages are sent in order according to priority. There won't
be parallel state transition on one partition.

h3. Helix Controller Updates

When resource configuration is changed:

* Fill ClusterDataCache with secondary states and state models/factories.
* Compare for status delta and compose messages accordingly. Order messages according to state
model priority.
* Send the highest priority message to the participant.

One optimization opportunity is allowing parallel state transition messages if there is no
conflict.

When participant current state is changed:

* Read secondary states and fill new external view ZK node with encoded complete status information.

h3. Helix Participant Updates

On receiving state transition message:

* Check if the message is a registered state model. Trigger state transition.
*   If any state transition failed, set an error state and stop processing. The user should
fix the problem and reset to initial state.
*   If state transition succeeds, update the current state.

h2. Alternative Options for Supporting Additional States

h3. Introducing special state for additional status change

Adding a new internal state UPGRADING (or other special states) for status change.
So any additional status change will happen when a partition is transited "to" or "from" UPGRADING
state.
Note that application has the freedom to define whether UPGRADING is a special online status
or not.This is for decoupling the main state from additional "states".
For Pinot case, upgrading partition (even before they are back to ONLINE) might be active
partition.

The problem of this new state is that it only works fine for a single additional state model.
Once we have more than one state models to take care, and they are changed separately, UPGRADING
state is not enough.

h3. Rely on resetting partition to load new "states"

Whenever new states are going to be set, application updates resource configuration. Then
resetting all partitions.
Then during state transition from offline to online, participants will read new states from
the configuration and apply to the related partitions.

The problem is that changing in additional states will affect the main state. The partition
will be offline for a while.

h3. Application registers additional message handler for customized transition message

In this method, application owns the logic. Helix just dispatches customized state transition
message to trigger the operation. In the message handler, the application read and write the
information of the additional state to the property store.

Consider additional states is a generic requirement, letting multiple applications to implement
similar logic separately does not make sense.

> Extend Helix to Support Resource with Multiple States
> -----------------------------------------------------
>
>                 Key: HELIX-659
>                 URL: https://issues.apache.org/jira/browse/HELIX-659
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>    Affects Versions: 0.6.x
>            Reporter: Jiajun Wang
>
> h1. Problem Statement
> h2. Single State Model v.s. Multiple State Models
> Currently, Each Helix resource is associated with a single state model, and each replica
of a partition can only be in any one of these states defined in the state model at any time.
And Helix manages state transition based on the single state model.
> !https://documents.lucidchart.com/documents/e19ab04e-aa06-4ab3-9e57-cfe273554fa1/pages/0_0?a=2416&x=-11&y=71&w=517&h=198&store=1&accept=image%2F*&auth=LCA%20313ced8fb855e8fc1a7043f7fe91cdfa15fffb6b-ts%3D1498857664!
> However, in many scenarios, resources could be more complicated to be modeled by a single
state model.
> As an example, partitions from a resource could be described in different dimensions:
SlaveMaster state, Read or Write state and its versions. They represent different dimensions
of the overall resource status. States from each dimension are based on different state models.
Note that we have state machines simplified in this document.
> !https://documents.lucidchart.com/documents/e19ab04e-aa06-4ab3-9e57-cfe273554fa1/pages/0_0?a=2416&x=-71&y=66&w=1822&h=308&store=1&accept=image%2F*&auth=LCA%2041fa743ba130f41786dee3527de6206cebdd4534-ts%3D1498857664!
> The basic idea is that states in these 3 dimensions are in parallel and can be changed
independently. For instance, R/W state may be changed without updating slave/master state.
> h2. Finite State Machine v.s. Dynamic State Model
> In addition, Helix employs finite state machine to define a state model. However, some
state model can not be easily modeled by a finite state machine with fixed states, for example,
the versions.  We call such state model as the dynamic state model. It is read, set, and understood
by the application. We will need to extend Helix to support such dynamic state model. Note
that Helix should not and will not be able to calculate the best possible dynamic states.
> The version of a software is one of the best examples to understand dynamic state.
> Let's consider one application that is deployed on multiple nodes, which work together
as a cluster. The green node works as the master, and all dark blue nodes are slaves. When
Admins upgrades the service from 1.0.0 to 1.1.0, they need to ensure upgrading all nodes to
the new version and then claim upgrade is done. After the upgrade process, it is important
to ensure that all software versions are consistent.
> If Helix framework is leveraged to support upgrading the cluster, it will help to simplify
application logic and ensure consistency. For instance, the service (cluster) itself is regarded
as the resource. And each node is mapped as a partition. Then upgrading is simply a state
transition. Admins can check external view for ensuring consistency.
> Note that during this version upgrade, the master node is still master node, and slave
nodes are still slave nodes. So the version state is parallel to the other states.
> !https://documents.lucidchart.com/documents/e19ab04e-aa06-4ab3-9e57-cfe273554fa1/pages/0_0?a=2066&x=1466&y=922&w=560&h=455&store=1&accept=image%2F*&auth=LCA%20fa3d8fc0d113a82f4e94b127161cf91818a2fe64-ts%3D1497894598!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message