hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-207) Computing Input Splits on the MR Cluster
Date Sat, 04 Jul 2009 00:01:47 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727162#action_12727162
] 

Philip Zeyliger commented on MAPREDUCE-207:
-------------------------------------------

I've been poking around here and am running into a fair amount of friction with how different
task types are managed.

As far as I can tell, there are several ways that different task types are distinguished:

* There's a {{TaskType}} enum, which contains  MAP, REDUCE, JOB_SETUP, JOB_CLEANUP, and TASK_CLEANUP.
 This is used quite a bit.
* TaskInProgress has isMapTask(), isJobCleanupTask(), isJobSetupTask().  I believe that TIP
can report both isMapTask() and isJobCleanupTask() on the same object and that reduces are
implied by !isMapTask().
* Task uses a hybrid approach.  There's MapTask and ReduceTask (a class hierarchy), but there's
also isMapTask(), isJobSetupTask(), isTaskCleanupTask(), and isJobCleanuptask().
* Schedulers and TaskTrackers for the most part only deal with MAP and REDUCE tasks.  Really,
these are "slot types", since other types of tasks can be run in them.  Schedulers are not
aware of the "special tasks"---the JobTracker schedules them "manually" on its own.

Does this sound about right?

-- Philip

> Computing Input Splits on the MR Cluster
> ----------------------------------------
>
>                 Key: MAPREDUCE-207
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Philip Zeyliger
>
> Instead of computing the input splits as part of job submission, Hadoop could have a
separate "job task type" that computes the input splits, therefore allowing that computation
to happen on the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message