hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7371) NPE in ServiceMaster after RM is restarted and then the ServiceMaster is killed
Date Thu, 02 Nov 2017 19:59:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236488#comment-16236488

Jian He commented on YARN-7371:

Patch looks good to me overall, some comments:
- This method can be removed as it’s only used by this class itself
public Token createContainerToken(ContainerId containerId,
    int containerVersion, NodeId nodeId, String appSubmitter,
    Resource capability, Priority priority, long createTime,
    LogAggregationContext logAggregationContext, String nodeLabelExpression,
    ContainerType containerType) {
  return createContainerToken(containerId, containerVersion, nodeId,
      appSubmitter, capability, priority, createTime, null, null,
      ContainerType.TASK, ExecutionType.GUARANTEED, -1);
- For testRecoverComponentsAfterRMRestart, can you also check that the containers retrieved
by serviceClient#getStatus are old containers of the 1st attempt, i.e. no containers are getting
relaunched because of AM restart.

> NPE in ServiceMaster after RM is restarted and then the ServiceMaster is killed
> -------------------------------------------------------------------------------
>                 Key: YARN-7371
>                 URL: https://issues.apache.org/jira/browse/YARN-7371
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>         Attachments: YARN-7371-yarn-native-services.001.patch, YARN-7371-yarn-native-services.002.patch,
YARN-7371-yarn-native-services.003.patch, YARN-7371-yarn-native-services.004.patch, YARN-7371-yarn-native-services.005.patch
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.service.ServiceScheduler.recoverComponents(ServiceScheduler.java:313)
> at org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:265)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:150)
> Steps:
> 1. Stopped RM and then started it
> 2. Application was still running
> 3. Killed the ServiceMaster to check if it recovers
> 4. Next attempt failed with the above exception

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message