Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Fri, 1 Feb 2013 23:56:14 +0000 (UTC)
From: "Arun C Murthy (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12630471.1359760897613.233005.1359762974194@arcas>
In-Reply-To: <JIRA.12630471.1359760897613@arcas>
References: <JIRA.12630471.1359760897613@arcas>
Subject: [jira] [Commented] (YARN-371) Consolidate resource requests in
 AM-RM heartbeat
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569277#comment-13569277 ] 

Arun C Murthy commented on YARN-371:
------------------------------------

{quote}
I would propose the tweak of allowing a single ResourceRequest to encapsulate all the location information for a task. So instead of just a single location, a ResourceRequest would contain an array of locations, including nodes that it would be happy with, racks that it would be happy with, and possibly *. Side effects of this change would be a reduction in the amount of data that needs to be transferred in a heartbeat, as well in as the RM's memory footprint, becaused what used to be different requests for the same task are now able to share some common data.
{quote}

-1

The main advantage of the proposed model is that it is extremely compact in terms of the amount of state necessary, per application, on the ResourceManager for scheduling and the amount of information passed around between the ApplicationMaster & ResourceManager. This is crucial for scaling the ResourceManager. The amount of information, per application, in this model is always O(cluster size), whereas in the current Hadoop Map-Reduce JobTracker it is O(number of tasks) which could run into hundreds of thousands of tasks. For large jobs it is sufficient to ask for containers only on racks and not specific hosts since the ApplicationMaster can use them appropriately since each rack has many appropriate resources (i.e. input splits for MapReduce applications).

To be clear, we should avoid *task* specific view and stay 'resource-specific' view.
                
> Consolidate resource requests in AM-RM heartbeat
> ------------------------------------------------
>
>                 Key: YARN-371
>                 URL: https://issues.apache.org/jira/browse/YARN-371
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api, resourcemanager, scheduler
>    Affects Versions: 2.0.2-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> Each AMRM heartbeat consists of a list of resource requests. Currently, each resource request consists of a container count, a resource vector, and a location, which may be a node, a rack, or "*". When an application wishes to request a task run in multiple localtions, it must issue a request for each location.  This means that for a node-local task, it must issue three requests, one at the node-level, one at the rack-level, and one with * (any). These requests are not linked with each other, so when a container is allocated for one of them, the RM has no way of knowing which others to get rid of. When a node-local container is allocated, this is handled by decrementing the number of requests on that node's rack and in *. But when the scheduler allocates a task with a node-local request on its rack, the request on the node is left there.  This can cause delay-scheduling to try to assign a container on a node that nobody cares about anymore.
> Additionally, unless I am missing something, the current model does not allow requests for containers only on a specific node or specific rack. While this is not a use case for MapReduce currently, it is conceivable that it might be something useful to support in the future, for example to schedule long-running services that persist state in a particular location, or for applications that generally care less about latency than data-locality.
> Lastly, the ability to understand which requests are for the same task will possibly allow future schedulers to make more intelligent scheduling decisions, as well as permit a more exact understanding of request load.
> I would propose the tweak of allowing a single ResourceRequest to encapsulate all the location information for a task.  So instead of just a single location, a ResourceRequest would contain an array of locations, including nodes that it would be happy with, racks that it would be happy with, and possibly *.  Side effects of this change would be a reduction in the amount of data that needs to be transferred in a heartbeat, as well in as the RM's memory footprint, becaused what used to be different requests for the same task are now able to share some common data.
> While this change breaks compatibility, if it is going to happen, it makes sense to do it now, before YARN becomes beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira