hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1087) Reducer hangs pulling from incorrect file.out.index path. (when one of the mapred.local.dir is not accessible but becomes available later at reduce time)
Date Thu, 08 Mar 2007 00:00:27 GMT
Reducer hangs pulling from incorrect file.out.index path. (when one of the mapred.local.dir
is not accessible but becomes available later at reduce time)
---------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1087
                 URL: https://issues.apache.org/jira/browse/HADOOP-1087
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.10.1
            Reporter: Koji Noguchi



2007-03-07 23:14:23,431 WARN org.apache.hadoop.mapred.TaskRunner: java.io.IOException: Server
returned HTTP response code: 500 for URL: http://____:____/mapOutput?map=task_7810_m_000897_0&reduce=397
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1149)
  at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:121)
  at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:236)
  at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:199)
2007-03-07 23:14:23,431 WARN org.apache.hadoop.mapred.TaskRunner: task_7810_r_000397_0 adding
host ____.com to penalty box, next contact in 279 seconds

This happened when one of the drives was full and not accessible at map time.

and one mapper

    public void mergeParts() throws IOException {
      ...
      Path finalIndexFile = mapOutputFile.getOutputIndexFile(getTaskId());

failed on the first hash entry in mapred.local.dir and used the second entry

Afterwards, first dir entry became available and when reducer tried to pull through,
    public static class MapOutputServlet extends HttpServlet {
      ...
      Path indexFileName = conf.getLocalPath(mapId+"/file.out.index");

it used the first entry.

As a result, directory was empty and reducer kept on trying to pull from the incorrect path
and hang.

(wasn't sure if this is a duplicate of HADOOP-895 since it is not reproducible unless I get
disk failure.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message