hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arkady Borkovsky <ark...@yahoo-inc.com>
Subject Re: [jira] Updated: (HADOOP-76) Implement speculative re-execution of reduces
Date Fri, 20 Oct 2006 01:46:40 GMT
Just a question:

did we consider "speculative execution of reduces with sub-splitting"?
That is, for each of the last / slow reduce tasks run several task 
subslitting the input of the original task?
The implementation would be probably more complicated, but the benefits 
may be huge -- currently, it is not uncommon to see the single last 
reduce task running for hours (if not for days).

-- ab

On Oct 19, 2006, at 10:43 AM, Sanjay Dahiya (JIRA) wrote:

>      [ http://issues.apache.org/jira/browse/HADOOP-76?page=all ]
>
> Sanjay Dahiya updated HADOOP-76:
> --------------------------------
>
>     Attachment: Hadoop-76.patch
>
> This patch is up for review.
>
> Here is the list of changes included in this patch -
>
> Replaced recentTasks to a Map, added a new method in TaskInProgress 
> hasRanOnMachine, which looks at this Map and hasFailedOnMachines(). 
> This is used to avoid scheduling multiple reduce instances of same 
> task on the same node.
>
> Added a PhasedRecordWriter, which takes a RecordWriter, tempName, 
> finalName. Another option was to create a PhasedOutputFormat, this 
> seems more natural as it works with any existing OutputFormat and 
> RecordWriter. Records are written to tempName and when commit is 
> called they are moved to finalName.
>
> ReduceTask.run() - if speculative execution is enabled then reduce 
> output is written to a temp location using PhasedRecordWriter. After 
> task finishes the output is written to a final location.
> If some other speculative instance finishes first then 
> TaskInProgress.shouldCloseForClosedJob() returns true for the taskId. 
> On TaskTracker the task is killed by Process.destroy() so cleanup code 
> is in TaskTracker instead of Task. The cleanup of Maps happen in Conf, 
> which is probably misplaced. We could refactor this part for both Map 
> and Reduce and move cleanup code to some utility classes which, given 
> a Map and Reduce task track the files generated and cleanup if needed.
>
> Added an extra attribute in TaskInProgress - runningSpeculative, to 
> avoid running more than one speculative instances of ReduceTask. Too 
> many Reduce instances for same task could increase load on Map 
> machines, this needs discussion. I can revert this change back to 
> allow some other number of instances of Reduces (MAX_TASK_FAILURES?).
>
> comments
>
>> Implement speculative re-execution of reduces
>> ---------------------------------------------
>>
>>                 Key: HADOOP-76
>>                 URL: http://issues.apache.org/jira/browse/HADOOP-76
>>             Project: Hadoop
>>          Issue Type: Improvement
>>          Components: mapred
>>    Affects Versions: 0.1.0
>>            Reporter: Doug Cutting
>>         Assigned To: Sanjay Dahiya
>>            Priority: Minor
>>         Attachments: Hadoop-76.patch, spec_reducev.patch
>>
>>
>> As a first step, reduce task outputs should go to temporary files 
>> which are renamed when the task completes.
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the 
> administrators: 
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: 
> http://www.atlassian.com/software/jira
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message