tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-2064) SessionNotRunning Exception not thrown is all cases
Date Mon, 09 Feb 2015 20:07:36 GMT

    [ https://issues.apache.org/jira/browse/TEZ-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312773#comment-14312773

Hitesh Shah commented on TEZ-2064:

+1 for both (1) and (2). 

And yes, not sure on (3) as it does add additional overhead which in most cases should likely
not happen and also, it will be eventually caught when the actual submission takes place.

> SessionNotRunning Exception not thrown is all cases
> ---------------------------------------------------
>                 Key: TEZ-2064
>                 URL: https://issues.apache.org/jira/browse/TEZ-2064
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Priority: Critical
> Hive handles SessionNotRunning during submitDAG() and restarts the tez-session
> if it receives one. In YHIVE-15, we did not receive that and the query failed. In some
scenarios the Application will fall out of the RM's knowledge and a ApplicationNotFound exception
is received instead.
> Here are my asks.
> 1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if
> application is expired. Basically any API which currently returns
> SessionNotRunning should handle the app-not-found scenario.
> 2. It would help if TezClient.getAppMasterStatus() can return
> TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM.
> That way, as a precaution, applications could check before submitting DAG's.
> 3. I think it might be better if verifySessionStateForSubmission() checks the
> app Status every time instead of checking sessionStarted. I am not sure about
> side-effects, but will leave that to your decision.
> If 3 takes time, we can pursue that later. It would really help to get 1 & 2 in
> the next tez release, especially for busy grids.

This message was sent by Atlassian JIRA

View raw message