hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8489) Need to support "dominant" component concept inside YARN service
Date Tue, 16 Oct 2018 23:57:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652684#comment-16652684

Eric Yang commented on YARN-8489:

{quote}If it is never, dominant field will be ignored. Otherwise dominant field is allowed.{quote}

If we go by what you proposed, user expectation of dominant field and restart policy will
not be right.  Earlier comment was proposing to clean up other components, when the dominant
component finished.  The dominant component could be a batch job that should not be repeated.
 Ignore does not sound like the right solution here.

Dependent component state changed to FAILED to signal other components to terminate seems
like a more intuitive approach to address the state transition problem.  This will ensure
restart policy or upgrade trigged state change requires no addition insertion of logic to
safe guard dominant component.

- Transition to SUCCEEDED && component.dominant == true: Set service state to SUCCEEDED.

- Transition to FAILED && component.dominant == true. Set service state to FAILED.


This looks like you want the service to report successful state or failure state based on
the "important" component status instead of every component report SUCCEEDED to get service
state SUCCEEDED.  A safer approach to enable this logic is to have a boolean flag in component
level to indicate "report_as_service_state":true.  This requires no alteration to state transition
logic, but add a check in the end.

> Need to support "dominant" component concept inside YARN service
> ----------------------------------------------------------------
>                 Key: YARN-8489
>                 URL: https://issues.apache.org/jira/browse/YARN-8489
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: yarn-native-services
>            Reporter: Wangda Tan
>            Priority: Major
> Existing YARN service support termination policy for different restart policies. For
example ALWAYS means service will not be terminated. And NEVER means if all component terminated,
service will be terminated.
> The name "dominant" might not be most appropriate , we can figure out better names. But
in simple, it means, a dominant component which final state will determine job's final state
regardless of other components.
> Use cases: 
> 1) Tensorflow job has master/worker/services/tensorboard. Once master goes to final state,
no matter if it is succeeded or failed, we should terminate ps/tensorboard/workers. And the
mark the job to succeeded/failed. 
> 2) Not sure if it is a real-world use case: A service which has multiple component, some
component is not restartable. For such services, if a component is failed, we should mark
the whole service to failed. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message