hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1635) ResourceEstimator does not work after MAPREDUCE-842
Date Mon, 29 Mar 2010 08:35:27 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Amareshwari Sriramadasu updated MAPREDUCE-1635:

    Attachment: patch-1635.txt

I think the solution is to move the calculation of task output size to Task, instead of TaskTracker
trying to construct the output file and failing. Task already has all the information of MapOutputFile.
So, Task can set the output size in its last update, before sending umbilical.done(). 

Attached patch does the above fix. I added a MiniMR test to test task output sizes for map-only
job, map-reduce job and a failed job.

In trunk, the log saying " reported output size..."  in TaskTracker.TaskInProgress.reportDone()
does not make sense, because setOutputSize() happens after the reportDone() call. 
But, with the attached patch it makes sense. I validated that the log prints proper value
with patch.

Patch removes following null checks in the code :
-      Path tmp_output =  mapOutputFile.getOutputFile();
-      if(tmp_output == null)
-        return 0;
-      FileSystem localFS = FileSystem.getLocal(conf);
-      FileStatus stat = localFS.getFileStatus(tmp_output);
-      if(stat == null)
-        return 0;
Because, mapOutputFile.getOutputFile() or localFS.getFileStatus(tmp_output) would never return
null. Those calls either return proper value or throw an Exception. And the method handles
Exception properly. Essentially these checks are unreachable code. Moreover, the return values
deviate from the documentation that output size should be -1 if it can not be calculated.

Also, TaskStatus.outputSize is initialized to -1 to take care of task failures.

> ResourceEstimator does not work after MAPREDUCE-842
> ---------------------------------------------------
>                 Key: MAPREDUCE-1635
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1635
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Amareshwari Sriramadasu
>             Fix For: 0.22.0
>         Attachments: patch-1635.txt
> MAPREDUCE-842 changed Child's mapred.local.dir to have attemptDir as the base local directory.
Also assumption is that
> org.apache.hadoop.mapred.MapOutputFile always gets Child's mapred.local.dir. 
> But, MapOuptutFile.getOutputFile() is called from TaskTracker's conf, which does not
find the output file. Thus TaskTracker.tryToGetOutputSize() always returns zero.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message