hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-8489) Need to support "dominant" component concept inside YARN service
Date Mon, 15 Oct 2018 22:17:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650876#comment-16650876
] 

Eric Yang edited comment on YARN-8489 at 10/15/18 10:16 PM:
------------------------------------------------------------

We might be able to refine our existing definitions to enable this without defining additional
restart policy or state.  If a service has two components defined, component A and B.  B depends
on A.  Component A restart_policy=NEVER.  If component A failed, AM will toggle component
A state to FLEXING, and component B continues to run.  Service is most likely not working
anymore when it reached this state.  We may want to shutdown the service to match the expected
behavior in this JIRA.


was (Author: eyang):
We might be able to refine our existing definitions to enable this without defining additional
restart policy or state.  If a service has two components defined, component A and B.  B depends
on A.  Component A restart_policy=NEVER.  If component A failed, AM will toggle component
A state to FLEXING, and component B continues to run.  Service is most likely not working
anymore when it reach this state.  We may want to shutdown the service to match the expected
behavior in this JIRA.

> Need to support "dominant" component concept inside YARN service
> ----------------------------------------------------------------
>
>                 Key: YARN-8489
>                 URL: https://issues.apache.org/jira/browse/YARN-8489
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: yarn-native-services
>            Reporter: Wangda Tan
>            Priority: Major
>
> Existing YARN service support termination policy for different restart policies. For
example ALWAYS means service will not be terminated. And NEVER means if all component terminated,
service will be terminated.
> The name "dominant" might not be most appropriate , we can figure out better names. But
in simple, it means, a dominant component which final state will determine job's final state
regardless of other components.
> Use cases: 
> 1) Tensorflow job has master/worker/services/tensorboard. Once master goes to final state,
no matter if it is succeeded or failed, we should terminate ps/tensorboard/workers. And the
mark the job to succeeded/failed. 
> 2) Not sure if it is a real-world use case: A service which has multiple component, some
component is not restartable. For such services, if a component is failed, we should mark
the whole service to failed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message