hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Dahiya (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-76) Implement speculative re-execution of reduces
Date Thu, 19 Oct 2006 17:43:37 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-76?page=all ]

Sanjay Dahiya updated HADOOP-76:
--------------------------------

    Attachment: Hadoop-76.patch

This patch is up for review. 

Here is the list of changes included in this patch - 

Replaced recentTasks to a Map, added a new method in TaskInProgress hasRanOnMachine, which
looks at this Map and hasFailedOnMachines(). This is used to avoid scheduling multiple reduce
instances of same task on the same node. 

Added a PhasedRecordWriter, which takes a RecordWriter, tempName, finalName. Another option
was to create a PhasedOutputFormat, this seems more natural as it works with any existing
OutputFormat and RecordWriter. Records are written to tempName and when commit is called they
are moved to finalName. 

ReduceTask.run() - if speculative execution is enabled then reduce output is written to a
temp location using PhasedRecordWriter. After task finishes the output is written to a final
location. 
If some other speculative instance finishes first then TaskInProgress.shouldCloseForClosedJob()
returns true for the taskId. On TaskTracker the task is killed by Process.destroy() so cleanup
code is in TaskTracker instead of Task. The cleanup of Maps happen in Conf, which is probably
misplaced. We could refactor this part for both Map and Reduce and move cleanup code to some
utility classes which, given a Map and Reduce task track the files generated and cleanup if
needed. 

Added an extra attribute in TaskInProgress - runningSpeculative, to avoid running more than
one speculative instances of ReduceTask. Too many Reduce instances for same task could increase
load on Map machines, this needs discussion. I can revert this change back to allow some other
number of instances of Reduces (MAX_TASK_FAILURES?). 

comments

> Implement speculative re-execution of reduces
> ---------------------------------------------
>
>                 Key: HADOOP-76
>                 URL: http://issues.apache.org/jira/browse/HADOOP-76
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.1.0
>            Reporter: Doug Cutting
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-76.patch, spec_reducev.patch
>
>
> As a first step, reduce task outputs should go to temporary files which are renamed when
the task completes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message