helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-654) Rebalance running task
Date Thu, 18 May 2017 00:07:04 GMT

    [ https://issues.apache.org/jira/browse/HELIX-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014967#comment-16014967
] 

ASF GitHub Bot commented on HELIX-654:
--------------------------------------

Github user jiajunwang commented on a diff in the pull request:

    https://github.com/apache/helix/pull/88#discussion_r117137720
  
    --- Diff: helix-core/src/main/java/org/apache/helix/task/JobRebalancer.java ---
    @@ -455,6 +454,44 @@ private ResourceAssignment computeResourceMapping(String jobResource,
         return ra;
       }
     
    +  /**
    +   * If assignment is different from previous assignment, drop the old running task if
it's no
    +   * longer assigned to the same instance, but not removing it from excludeSet because
the same task
    +   * should not be assigned to the new instance right way.
    +   */
    +  private void dropRebalancedRunningTasks(Map<String, SortedSet<Integer>>
newAssignment,
    +      Map<String, SortedSet<Integer>> oldAssignment, Map<Integer, PartitionAssignment>
paMap,
    +      JobContext jobContext) {
    +    for (String instance : oldAssignment.keySet()) {
    +      for (Integer pId : oldAssignment.get(instance)) {
    +        if (jobContext.getPartitionState(pId) == TaskPartitionState.RUNNING
    +            && !newAssignment.get(instance).contains(pId)) {
    +          paMap.put(pId, new PartitionAssignment(instance, TaskPartitionState.DROPPED.name()));
    +          jobContext.setPartitionState(pId, TaskPartitionState.DROPPED);
    --- End diff --
    
    Do we need to set DROPPED here?
    New status will be updated by updateJobContextAndGetTaskCurrentState() next round, right?
    
    One problem of setting DROPPED here is that if the participant cannot cancel the job in
a short time, it's status will still be RUNNING. Then in the first round, the controller sets
it to be DROPPED.  In the second round, it will be changed back to RUNNING. Although, eventually
the state will be correct, it is confusing during this period.


> Rebalance running task
> ----------------------
>
>                 Key: HELIX-654
>                 URL: https://issues.apache.org/jira/browse/HELIX-654
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>            Reporter: Weihan Kong
>
> h3. Feature summary
> Helix Task Framework empowers user to run tasks on instances managed by Helix. There're
2 type of tasks: generic task and fixed target task. For fixed target task, the task always
follows the targeted partition and is rebalanced if the partition is rebalanced. For generic
task, Helix provides user the choice to rebalance the running task or not, when the topology
of the cluster changes.
> For most users, it's better to disabled this feature(as default) since there's no need
to re-run the task every time new node is added. For users with long-running tasks, enabling
this feature can be very useful so that when new node is added, the load of the tasks are
better balanced among the cluster.
> h3. Defined system behavior
> h4. When a node fails,
> h6. Feature disabled:
> * Running tasks on that failed node will be rebalanced to a live node, since the task
no longer exists and failed with the node.
> h6. Feature enabled:
> * Same.
> h4. When a new node is added,
> h6. Feature disabled:
> * Running tasks will continue to run on the current instance.
> * If a running task fails after a while, it might be rebalanced and run on other instances,
according to the new rebalance assignment under the new cluster topology.
> h6. Feature enabled:
> * Running task might be cancelled and rebalanced immediately, according to the new rebalance
assignment under the new cluster topology.
> h3. Configuration
> A job level config field(RebalanceRunningTask) in JobConfig to enable/disable this feature.
By default it's false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message