hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress
Date Fri, 25 May 2007 17:16:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499169

Devaraj Das commented on HADOOP-1431:

Doug, agree with you that this issue should be handled more generally as part of HADOOP-1201
scheduled for 0.14. That's why i put a comment (the first comment on this issue) to that effect
when Owen raised the bug. I believe that the sort progress reporting as is done today has
been working fine for quite some time (many months actually), and I can't remember what bug
got introduced there (sorry). The only reason why sort could get stuck is for reason of bad
user code in the Comparator and I am not convinced that we would have handled that issue completely
without handling the merge cases also. 
On a side note, one problem that exists today is that the child Map/Reduce processes sometimes
(rarely on linux), for some reason, doesn't exit even after the map/reduce method invocations
are over (TaskRunner.run() doesn't exit, and hence tracker.reportTaskFinished(t.getTaskId())
is not called and finally the TaskTracker kills it after the timeout interval in the method
But again, I am happy if we agree that we should look at this issue in more detail for 0.14

> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>         Attachments: HADOOP-1431_1_20070525.patch
> Currently the map task runner creates a thread that calls progress every second to keep
the system from killing the map if the sort takes too long. This is the wrong approach, because
it will cause stuck tasks to not be killed. The right solution is to have the sort call progress
as it actually makes progress. This is part of what is going on in HADOOP-1374. A map gets
stuck at 100% progress, but not done.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message