tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-1141) DAGStatus.Progress should include number of failed attempts
Date Wed, 22 Oct 2014 18:48:33 GMT

    [ https://issues.apache.org/jira/browse/TEZ-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180310#comment-14180310

Hitesh Shah commented on TEZ-1141:

bq. Since this is a departure from not updating state machine stuff outside of the transitions,
we should probably put a comment explaining why we are doing it and commit this.

Will do. 

bq. Probably avoid checking for the impl - so that the test passes when the impl changes but
the external effect is still tested.

Would prefer to keep this in for now as it is the only test of the vertex counter increment
functionality. When the impl is changed, this test failing should be fixed by the developer
making the impl changes. 

bq. Though the above logic will probably not apply if we have to add support for running attempts
also. Because they will go up and down. Will speculation create the need to provide running
attempt count?

You are correct. This cannot work for cases where counts go up and down/get reversed. Such
cases can only be handled via the necessary events or by querying each and every object. 

bq. Do we also need to add killed attempts in the progress? In case, many attempts are being
killed due to cluster issues (bad nodes etc) which dont surface now?
Might be a good idea. This request was raised by [~gopalv] when looking at using it in Hive.
[~gopalv] any comments? 


> DAGStatus.Progress should include number of failed attempts
> -----------------------------------------------------------
>                 Key: TEZ-1141
>                 URL: https://issues.apache.org/jira/browse/TEZ-1141
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.5.0
>            Reporter: Bikas Saha
>            Assignee: Hitesh Shah
>         Attachments: TEZ-1141.1.patch
> Currently its impossible to know whether a job is seeing a lot of issues and failures
because we only report running tasks. Eventually the job fails but before that we have no
indication that a bunch of task failures have been happening.

This message was sent by Atlassian JIRA

View raw message