hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikkel Kamstrup Erlandsen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-7) MapReduce has a series of problems concerning task-allocation to worker nodes
Date Tue, 18 Jul 2006 11:44:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-7?page=comments#action_12421847 ] 
            
Mikkel Kamstrup Erlandsen commented on HADOOP-7:
------------------------------------------------

This is likely me being dumb, but I don't think this issue is fixed.

When I run any of the provided example programs wordcount/grep (also pi with specualtive excecution
enabled) reduce tasks does not start before all map tasks have completed.

My cluster contains three nodes and I am running Hadoop 0.4.0.

> MapReduce has a series of problems concerning task-allocation to worker nodes
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-7
>                 URL: http://issues.apache.org/jira/browse/HADOOP-7
>             Project: Hadoop
>          Issue Type: Bug
>         Environment: All
>            Reporter: Mike Cafarella
>             Fix For: 0.1.0
>
>         Attachments: jobtracker.patch
>
>
> The MapReduce JobTracker is not great at allocating tasks to TaskTracker worker nodes.
> Here are the problems:
> 1) There is no speculative execution of tasks
> 2) Reduce tasks must wait until all map tasks are completed before doing any work
> 3) TaskTrackers don't distinguish between Map and Reduce jobs.  Also, the number of
> tasks at a single node is limited to some constant.  That means you can get weird deadlock
> problems upon machine failure.  The reduces take up all the available execution slots,
but they
> don't do productive work, because they're waiting for a map task to complete.  Of course,
that
> map task won't even be started until the reduce tasks finish, so you can see the problem...
> 4) The JobTracker is so complicated that it's hard to fix any of these.
> The right solution is a rewrite of the JobTracker to be a lot more flexible in task handling.
> It has to be a lot simpler.  One way to make it simpler is to add an abstraction I'll
call
> "TaskInProgress".  Jobs are broken into chunks called TasksInProgress.  All the TaskInProgress
> objects must be complete, somehow, before the Job is complete.
> A single TaskInProgress can be executed by one or more Tasks.  TaskTrackers are assigned
Tasks.
> If a Task fails, we report it back to the JobTracker, where the TaskInProgress lives.
 The TIP can then
> decide whether to launch additional  Tasks or not.
> Speculative execution is handled within the TIP.  It simply launches multiple Tasks in
parallel.  The
> TaskTrackers have no idea that these Tasks are actually doing the same chunk of work.
 The TIP
> is complete when any one of its Tasks are complete.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message