hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
Date Thu, 01 Jun 2017 21:41:04 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033746#comment-16033746
] 

Haibo Chen commented on MAPREDUCE-6892:
---------------------------------------

Thanks [~mozer] for reporting and fixing the issue. The map/reduce counters in JobUnsuccessfulCompletionEvent
are inconsistent. In some cases,  # of successful mappers is passed to JobUnsuccessfulCompletion.finishedMaps.
In other cases, # of successful + failed + killed mappers is reported.  I think we should
make it as least consistent, and then add failed/killed mapper/reducer count. 

org.apache.hadoop.mapreduce.v2.app.job.Job is not client facing as far as I can tell, so I
think it's fine to add a few more methods (killedMapTaskCount, killedReduceTaskCount). In
upgrade cases though, JHS expects the new JobUnsuccessfulCompletion and JobFinishedEvent schema,
but it could pick up old .jhist file sthat do not conform with the new schema, we want to
make sure it handles the special situation gracefully. Can you check that?

> Issues with the count of failed/killed tasks in the jhist file
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-6892
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client, jobhistoryserver
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6892-001.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After parsing the
jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually there were failures. 
> Another minor thing is that you cannot get the number of killed tasks (although this
can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the successful
map/reduce task counts. Number of failed (or killed) tasks are not stored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message