hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5170) Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
Date Thu, 02 Jul 2009 22:17:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726709#action_12726709

Matei Zaharia commented on HADOOP-5170:

Arun, can you explain what the hard limit on capacity means? Is it option 2 in MAPREDUCE-532
that just disallows a queue from taking excess capacity? I'm still not sure how this can solve
the problems you brought up. In fact, as I pointed out in my earlier comment, even *without*
any limits defined, MAPREDUCE-516 can harm locality and utilization in its current form. Am
I missing something about per-job limits that makes them different from queue limits or from
no excess capacity being available?

Owen, I agree that having MAPREDUCE-516 be in terms of multi-slot tasks would solve some issues,
but it doesn't sound like it would solve Jonathan's problem for example. I think that coming
up with a general multi-resource sharing model will be difficult. Is there anything we can
do in the meantime to support this use case? For example, it would be trivial for me to implement
the per-node limits in the fair scheduler, but then they wouldn't be available to users of
other schedulers.

This might also be a good time to figure out exactly what scheduling functionality can go
into JobInProgress/TaskTracker/etc and what should go into schedulers. I didn't think the
limits in this patch added much complexity. They obey the contract of obtainNewMapTask, which
is that it may or may not return a task. Schedulers already have to deal with jobs that have
no tasks to launch on one heartbeat, and then have tasks on the next, because of speculation.
So any scheduler that works with that should more or less be okay if the job chooses to launch
a task based on its total number of running tasks. If we agreed on some kind of contract,
then we would be able to implement common scheduling functionality in the mapreduce package
rather than having it be contrib. Otherwise, as long as there are multiple groups working
on scheduling on Hadoop, everyone will be worried that someone else's change will break their
future work.

> Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
> ----------------------------------------------------------------------------
>                 Key: HADOOP-5170
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5170
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Jonathan Gray
>            Assignee: Matei Zaharia
>             Fix For: 0.21.0
>         Attachments: HADOOP-5170-tasklimits-v3-0.18.3.patch, tasklimits-v2.patch, tasklimits-v3-0.19.patch,
tasklimits-v3.patch, tasklimits-v4-20.patch, tasklimits-v4.patch, tasklimits.patch
> There are a number of use cases for being able to do this.  The focus of this jira should
be on finding what would be the simplest to implement that would satisfy the most use cases.
> This could be implemented as either a per-node maximum or a cluster-wide maximum.  It
seems that for most uses, the former is preferable however either would fulfill the requirements
of this jira.
> Some of the reasons for allowing this feature (mine and from others on list):
> - I have some very large CPU-bound jobs.  I am forced to keep the max map/node limit
at 2 or 3 (on a 4 core node) so that I do not starve the Datanode and Regionserver.  I have
other jobs that are network latency bound and would like to be able to run high numbers of
them concurrently on each node.  Though I can thread some jobs, there are some use cases that
are difficult to thread (scanning from hbase) and there's significant complexity added to
the job rather than letting hadoop handle the concurrency.
> - Poor assignment of tasks to nodes creates some situations where you have multiple reducers
on a single node but other nodes that received none.  A limit of 1 reducer per node for that
job would prevent that from happening. (only works with per-node limit)
> - Poor mans MR job virtualization.  Since we can limit a jobs resources, this gives much
more control in allocating and dividing up resources of a large cluster.  (makes most sense
w/ cluster-wide limit)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message