hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4472) Should we move out the creation of setup/cleanup tasks from JobInProgress.initTasks()?
Date Wed, 29 Oct 2008 04:20:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643406#action_12643406
] 

Amareshwari Sriramadasu commented on HADOOP-4472:
-------------------------------------------------

bq. I don't think that is right. I think we should have a SetupTask and a CleanupTask that
are both id = 0. The isMap boolean needs to be replaced with an enumeration. {SETUP, MAP,
REDUCE, CLEANUP}. There should be tips associated with both setup and cleanup.
This is proposed in HADOOP-4421. 

bq. I don't see why the JT should ever poll the state. It knows the state changed via the
heartbeat.
Now there is no state change for setup. Both initTasks and setup happen in PREP state. 

bq. Furthermore, it should be notified again when the setup task is finished and the rest
of the job is runnable.
I think it makes sense to have RUNNABLE state, when the setup completes.

bq. I propose that we don't fix this issue at all for now. Instead we should look at HADOOP-4421.
+1. We can add RUNNABLE state also through HADOOP-4421.

> Should we move out the creation of setup/cleanup tasks from JobInProgress.initTasks()?

> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4472
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4472
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Vivek Ratan
>            Assignee: Amareshwari Sriramadasu
>
> JobInProgress.initTasks() creates TIPs for map and reduce tasks, and also the newly-introduced
setup and cleanup tasks. initTasks() is called by the schedulers, as for reasons of memory
optimizations, schedulers may choose to initialize M/R tasks at various moments (the Capacity
Scheduler, for example, calls initTasks() just when it considers a job for running). One can
say that Schedulers 'own' the initialization of M/R tasks in a job. Furthermore the JT 'owns'
the setup and cleanup tasks (it schedules them, and Schedulers are unaware of these tasks).
This causes a problematic dependency between the JT and a Scheduler. For example, the Capacity
Scheduler calls initTasks() and immediately calls JobInProgress.obtainNewMapTask for a map
task. This is a problem today, because we cannot run any map or reduce tasks before the setup
task is run, which the Capacity Scheduler is not aware of. 
> Either all Schedulers are explicitly aware of setup/cleanup tasks and their dependencies
with M/R tasks (in which case, Schedulers 'own' the creation and scheduling of all these tasks
correctly), or the JT 'owns' the setup/cleanup tasks and Schedulers are completely unaware
of them (in which case, the creation of setup/cleanup tasks must be moved out of initTasks
into a separate method which is called by the JT). 
> I think the latter is the right way to go (unless we implement HADOOP-4421, in which
case the former option may be viable as well). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message