hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1246) Ignored exceptions from MapOutputLocation.java:getFile lead to hung reduces
Date Wed, 11 Apr 2007 13:03:32 GMT
Ignored exceptions from MapOutputLocation.java:getFile lead to hung reduces

                 Key: HADOOP-1246
                 URL: https://issues.apache.org/jira/browse/HADOOP-1246
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.3
            Reporter: Arun C Murthy

Ignoring exceptions during fetching of map outputs in MapOutputLocation.java:getFile (e.g.
content-length doesn't match actual data recieved) leads to hung reduces since the MapOutputCopier
just ignores them, puts the host in the penalty box and retries forever.

Possible steps:
a) Distinguish between failure to fetch output v/s lost maps. (related to HADOOP-1158)
b) Ensure the reduce doesn't keep fetching from 'lost maps'. (related to HADOOP-1183)
c) On detection of 'failure to fetch' we probably should have exponential back-offs (versus
the same order back-offs as currently) for hosts in the 'penalty box'.
d) If fetches still fail for say 4 times (after exponential backoffs), we should declare the
Reduce as 'failed'.

This situation could also arise from situations like full-disks on the reducer, whereby it
isn't possible to save the map output on the local disk (say for large map outputs).


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message