hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication
Date Fri, 13 Feb 2015 22:23:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320872#comment-14320872
] 

Hitesh Shah commented on YARN-3131:
-----------------------------------

[~jlowe] Referring to my earlier comment, does it make more sense to do the simple checks
inline instead of doing them as part of the app state machine? The issue mainly stems from
the fact that in Tez, we start an AM and then submit work to it directly. In such cases, where
the AM is never launched, the underlying issue of why it was never launched gets hidden at
times. 

bq. YarnRunner "works" because it bothers to do one extra appreport after the app submission
completes to verify the app is still in a non-failed/killed state.

We added a unit test for this and have seen it failing randomly on a minicluster as catching
the failure on the first getAppReport() call is not reliable. Ref: TEZ-2058





> YarnClientImpl should check FAILED and KILLED state in submitApplication
> ------------------------------------------------------------------------
>
>                 Key: YARN-3131
>                 URL: https://issues.apache.org/jira/browse/YARN-3131
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>
> Just run into a issue when submit a job into a non-existent queue and YarnClient raise
no exception. Though that job indeed get submitted successfully and just failed immediately
after, it will be better if YarnClient can handle the immediate fail situation like YarnRunner
does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message