hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Dahiya (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-76) Implement speculative re-execution of reduces
Date Wed, 11 Oct 2006 13:36:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-76?page=comments#action_12441456 ] 
            
Sanjay Dahiya commented on HADOOP-76:
-------------------------------------

Here is a list of code level changes, I will test this stuff meanwhile

- Adding extra jobConf configuration - runSpeculativeReduces. 

- TaskInProgress maintains a list of nodes where it has already ran ( or is running ), this
will be used to not schedule a speculative instance where the task is already running or has
failed in past. [TIP already contains a list of nodes where it task failed ]. 

- Another option is if *any* reduce task is already assigned to this TT and is still running,
then its not assigned a speculative task. [comments?]

- TIP.hasSpeculative task , now checks for reduce tasks as well. currently it checks for only
map tasks. The exact condition(timeouts) in which reduce task should be executed speculatively
is open for discussion. using johan's conditions(finishedReduces / numReduceTasks >= 0.7
) for testing till then. 

- JobInProgress.findNewTask - looks for speculative tasks (TIP.hasSpeculativeTask()) and whether
the task ran on same task tracker. 

- If speculative execution of reduce is enabled then ReduceTask.run() creates a temp file
name for reduce output. When reduce task finishes it checks if the output file is already
written by some other reduce instance else it renames its output to final output. otherwise
temp output is deleted. 

- TaskTracker.TIP.cleanup() also cleans up the reduce task temp file if it is killed in between.


- JobTracker.pollForTaskWithClosedJob(), TIP.shouldCloseForClosedJob() - return true if a
speculative reduce task finished first, which ultimately goes down to TT and kills/cleans
up the task.

The exact condition(timeouts) in which reduce task should be executed speculatively is open
for discussion. 

comments? 

> Implement speculative re-execution of reduces
> ---------------------------------------------
>
>                 Key: HADOOP-76
>                 URL: http://issues.apache.org/jira/browse/HADOOP-76
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.1.0
>            Reporter: Doug Cutting
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: spec_reducev.patch
>
>
> As a first step, reduce task outputs should go to temporary files which are renamed when
the task completes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message