hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.
Date Mon, 15 May 2017 18:35:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011073#comment-16011073
] 

Wangda Tan commented on YARN-6585:
----------------------------------

Thanks [~eepayne]/[~nroberts]/[~sunilg] for reporting and investigating this issue.

[~sunilg] I felt this fix is not correct, reversing fields is an incompatible change.
In branch-2.7, we have string node labels in AddToClusterNodeLabelProto as 1st field. In existing
branch-2.8, we added a NodeLabelProto to 2nd field and renamed 1st field to "deprecated-".
So far this is compatible.

The problem is, existing implementation:
{code}
  private void initLocalNodeLabels() {
    AddToClusterNodeLabelsRequestProtoOrBuilder p = viaProto ? proto : builder;
    List<NodeLabelProto> attributesProtoList = p.getNodeLabelsList();
    this.updatedNodeLabels = new ArrayList<NodeLabel>();
    for (NodeLabelProto r : attributesProtoList) {
      this.updatedNodeLabels.add(convertFromProtoFormat(r));
    }
  }
{code} 

Inside {{AddToClusterNodeLabelsRequestPBImpl}} doesn't read from deprecated node label string
field (1st). In FileSystemNodeLabelStore, YARN read from serialized PB message and call {{new
AddToClusterNodeLabelsRequestPBImpl(AddToClusterNodeLabelsRequestProto proto)}}. If it fails
to read from 2nd field, it should try to read from the 1st one.

To make sure we have enough coverage, I suggest an unit test to read from branch-2.7 stored
node label file and make sure all fields can be read from branch-2.8 and above.

Thoughts?

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-6585
>                 URL: https://issues.apache.org/jira/browse/YARN-6585
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eric Payne
>            Assignee: Sunil G
>            Priority: Blocker
>         Attachments: YARN-6585.0001.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by known label
collections, please check, new labels=[abc]
>         at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
>         at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
>         at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
>         at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
>         at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
>         at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
>         at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message