hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress
Date Fri, 01 Jun 2007 17:47:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500778

Raghu Angadi commented on HADOOP-1431:

Doug's comment that was posted to HADOOP-1134 by mistake:

Calvin Yu noted on hadoop-user that join() seems to sometimes hang even if the thread has
been interrupted. In other places we use the idiom of a 'running' flag that's checked in a
thread's loop in conjunction with an interrupt, rather than interrupt+join, and that seems
to be reliable. So I think we should switch to that here to.

Also, in the current patch, I don't see why the thread is held in a field. I worry that someone
might add code like 'if (sortProgressThread == null) ...', and that we might somehow not always
null this field. If it is kept in a local variable around the call then this is much less
of a risk.

So I think we should convert the createProgressThread method to a nested class whose constructor
starts the thread and which has a stop() method that sets a flag. It would also be good if
the 'try' block could be shared between 'collect()' and 'flush()'. I think this calls for
a new method something like:

private void sortWithProgress() {
ProgressThread progress = new ProgressThread();
try { sortAndSpillToDisk(); } finally { progress.stop(); }

> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.13.0
>         Attachments: HADOOP-1431_1_20070525.patch, HADOOP-1431_2_20070530.patch, HADOOP-1431_3_20070601.patch
> Currently the map task runner creates a thread that calls progress every second to keep
the system from killing the map if the sort takes too long. This is the wrong approach, because
it will cause stuck tasks to not be killed. The right solution is to have the sort call progress
as it actually makes progress. This is part of what is going on in HADOOP-1374. A map gets
stuck at 100% progress, but not done.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message