hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billie Rinaldi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8080) YARN native service should support component restart policy
Date Mon, 07 May 2018 18:49:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466315#comment-16466315
] 

Billie Rinaldi commented on YARN-8080:
--------------------------------------

Another thing we should address is the concept of readiness for ON_FAILURE / NEVER component
instances. It seems like instances of these types shouldn't become READY unless they have
succeeded. Possibly this check could be added into the default readiness check.

> YARN native service should support component restart policy
> -----------------------------------------------------------
>
>                 Key: YARN-8080
>                 URL: https://issues.apache.org/jira/browse/YARN-8080
>             Project: Hadoop YARN
>          Issue Type: Task
>            Reporter: Wangda Tan
>            Assignee: Suma Shivaprasad
>            Priority: Critical
>         Attachments: YARN-8080.001.patch, YARN-8080.002.patch, YARN-8080.003.patch, YARN-8080.005.patch,
YARN-8080.006.patch, YARN-8080.007.patch
>
>
> Existing native service assumes the service is long running and never finishes. Containers
will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component specified
by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container exit status.
This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To support
job-like workload (for example Tensorflow training job). If a task exit with code == 0, we
should not restart the task. This can be used by services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when restart_policy
!= ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the same component
won't be affected by the finalized component instance. Service will be terminated once all
instances finalized. 
> 3) For multiple components: Service will be terminated once all components finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message