hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1431) Map tasks can't timeout for failing to call progress
Date Fri, 25 May 2007 16:10:16 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Arun C Murthy updated HADOOP-1431:

    Attachment: HADOOP-1431_1_20070525.patch

Here is a reasonably straight-forward to address the concerns raised by this patch - basically
I have implemented a ReportingComparator which sends a progress update every 100 comparisions
and this comparator is used for sorting/merging in both MapTask & ReduceTask.

The idea is that the 'compare' operation is a metric independent of the actual sorting/merging
algorithm and hence a good indicator of the 'progress' being made by the sort/merge done by
the framework in map/reduce task... 

I have adopted a policy similar to the one already employed in MapTask where the RecordReader
sends progress updates depending on the amount of bytes consumed from the input file i.e.
the ReportingComparator wraps a comparator and a reporter object and sends an update every
100 comparisions. The advantage is that the sort algorithm (which could be user-code i.e.
by extending BasicTypeSorterBase) is blissfully un-aware of the reporting going on under the
covers and also it ensures that there is no way even user-supplied comparators (e.g. JobConf.getOutputValueGroupingComparator())
can by-pass this reporting mechanism).

Appreciate review/feedback while I continue testing... I know Devaraj has some. *smile*

> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>         Attachments: HADOOP-1431_1_20070525.patch
> Currently the map task runner creates a thread that calls progress every second to keep
the system from killing the map if the sort takes too long. This is the wrong approach, because
it will cause stuck tasks to not be killed. The right solution is to have the sort call progress
as it actually makes progress. This is part of what is going on in HADOOP-1374. A map gets
stuck at 100% progress, but not done.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message