hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suma Shivaprasad (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8710) Service AM should set a finite limit on NM container max retries
Date Fri, 05 Oct 2018 19:25:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640248#comment-16640248
] 

Suma Shivaprasad commented on YARN-8710:
----------------------------------------

Thanks [~shuzirra] for the review. The total retry count can be configured by setting both
"yarn.resourcemanager.am.max-attempts" and "yarn.service.am-restart.max-attempts" so that
AM retries could be scheduled on other nodes. This would prevent delays in trying to schedule
on a node which may not be reachable or unhealthy. The service user can override this behaviour
by explicitly setting this in the YARN service spec if they still need infinite NM retries
via yarn.service.container-failure.retry.max 
Please let me know if you still have any concerns on the patch.

> Service AM should set a finite limit on NM container max retries 
> -----------------------------------------------------------------
>
>                 Key: YARN-8710
>                 URL: https://issues.apache.org/jira/browse/YARN-8710
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>            Priority: Major
>         Attachments: YARN-8710.1.patch
>
>
> Container retries are currently set to a default of -1 in AbstractProviderService.buildContainerRetry.
If this is not overridden via service spec with a finite value for yarn.service.container-failure.retry.max
, this causes infinite NM reties for the container for ALWAYS/ON_FAILURE restart policy .
Ideally it should try a finite number of time on the same NM and subsequently Service AM can
retry on another node.
> We can set this to default value of 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message