hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress
Date Thu, 31 May 2007 18:59:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500447
] 

Doug Cutting commented on HADOOP-1431:
--------------------------------------

Sigh. I wish this just started a new thread around each call to sortAndSpill, as I suggetested
above, something like:

try {
   Thread progress = createProgressThread(umbilical);
   sortAndSpill();
} finally {
   progress.interrupt();
}

As it stands, the call to stop the thread is in a finally, but after other things that could
throw exceptions, so there's no guarantee that the thread will actually exit.  And the calls
to pause the thread are not in a finally at all, so, if there's an exception in sorting, progress
will not stop.  Reusing a thread seems like a premature optimization that opens up lots of
possible error modes that we don't need.  I think rather we should simply narrow the scope
of the prior logic.  Threads are plenty cheap for this and I don't see the optimization is
worth either the risks it adds nor the increased code to maintain.


> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1431_1_20070525.patch, HADOOP-1431_2_20070530.patch
>
>
> Currently the map task runner creates a thread that calls progress every second to keep
the system from killing the map if the sort takes too long. This is the wrong approach, because
it will cause stuck tasks to not be killed. The right solution is to have the sort call progress
as it actually makes progress. This is part of what is going on in HADOOP-1374. A map gets
stuck at 100% progress, but not done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message