Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Mon, 11 Jun 2012 18:05:45 +0000 (UTC)
From: "Andrew Ferguson (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <1019792366.3371.1339437945892.JavaMail.jiratomcat@issues-vm>
In-Reply-To: <1489633945.51428.1339132523480.JavaMail.jiratomcat@issues-vm>
Subject: [jira] [Commented] (MAPREDUCE-4327) Enhance CS to schedule
 accounting for both memory and cpu cores
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/MAPREDUCE-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292933#comment-13292933 ] 

Andrew Ferguson commented on MAPREDUCE-4327:
--------------------------------------------

Hi Arun,

I'm excited to see this started -- I'm quite interested in the multi-resource scheduling problem. After reading through the patch, I have a few questions for you; hopefully this feedback will be helpful.

First off, I want to confirm my understanding is correct: this patch is designed to allocate resources to jobs within the same capacity queue based on the DRF-inspired ordering of their need for resources. It is not designed to do weighted DRF for the complete cluster. If I'm mistaken, perhaps some of my feedback my not apply.

1) Are you planning to change the definition of a queue's capacity? Currently, it is defined as a fractional percentage of the parent queue's total memory. Alternatively, queues could be specified with a fractional percentage of each resource. eg, I could have one queue with "75% CPU and 50% RAM" and a second with "25% CPU and 50% RAM".

2) Do you plan to change how spare capacity is allocated? My understanding is that it's currently shared proportionally, based on the queue capacities, an approach seems like it would be intuitive for cluster operators. With a multi-resource setup however, running DRF on the pool of spare resources would provide higher utilization. (I can provide an example of this if you'd like.)

3) Are you planning to support priorities or weights within the queues? IIRC, this was supported in the MR1 scheduler, and the DRF paper describes a weighted extension.

4) Lastly, with the increasing flexibility of the YARN scheduler, I think it makes sense to better support heterogenous clusters. Currently, yarn.nodemanager.resource.memory-mb is a constant across the cluster, but with a scheduler capable of packing differently shaped resource containers onto each node, heterogenous nodes would be a natural extension. (This is more of an observation than a question. :-)


Looking forward to further discussions.

cheers,
Andrew


> Enhance CS to schedule accounting for both memory and cpu cores
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4327
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4327
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, resourcemanager, scheduler
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4327.patch
>
>
> With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira