tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-1141) DAGStatus.Progress should include number of failed attempts
Date Wed, 22 Oct 2014 00:25:34 GMT

    [ https://issues.apache.org/jira/browse/TEZ-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179392#comment-14179392
] 

Bikas Saha commented on TEZ-1141:
---------------------------------

Since this is a departure from not updating state machine stuff outside of the transitions,
we should probably put a comment explaining why we are doing it and commit this.
Probably avoid checking for the impl - so that the test passes when the impl changes but the
external effect is still tested.
{code}+    verify(mockTask.getVertex(), times(1)).incrementFailedTaskAttemptCount();{code}

Thinking further.
Though the above logic will probably not apply if we have to add support for running attempts
also. Because they will go up and down. Will speculation create the need to provide running
attempt count?
Do we also need to add killed attempts in the progress? In case, many attempts are being killed
due to cluster issues (bad nodes etc) which dont surface now?

> DAGStatus.Progress should include number of failed attempts
> -----------------------------------------------------------
>
>                 Key: TEZ-1141
>                 URL: https://issues.apache.org/jira/browse/TEZ-1141
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.5.0
>            Reporter: Bikas Saha
>            Assignee: Hitesh Shah
>         Attachments: TEZ-1141.1.patch
>
>
> Currently its impossible to know whether a job is seeing a lot of issues and failures
because we only report running tasks. Eventually the job fails but before that we have no
indication that a bunch of task failures have been happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message