hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gera Shegalov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
Date Mon, 11 Aug 2014 20:06:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093224#comment-14093224

Gera Shegalov commented on MAPREDUCE-4818:

Jason, we have hit the same problem with a weaker network setup. I think it would be good
to introduce a new "localization phase" in MR task progress. E.g, for Mappers we had "localization",
"map", "sort". We should think how much of the total progress we want to attribute to localization.
Maybe 5%. For backwards compatibility, we can keep it 0% but still heartbeat to AM with a
status string such as "5/300 files localized".

> Easier identification of tasks that timeout during localization
> ---------------------------------------------------------------
>                 Key: MAPREDUCE-4818
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 0.23.3, 2.0.3-alpha
>            Reporter: Jason Lowe
>              Labels: usability
> When a task is taking too long to localize and is killed by the AM due to task timeout,
the job UI/history is not very helpful.  The attempt simply lists a diagnostic stating it
was killed due to timeout, but there are no logs for the attempt since it never actually got
started.  There are log messages on the NM that show the container never made it past localization
by the time it was killed, but users often do not have access to those logs.

This message was sent by Atlassian JIRA

View raw message