Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B535DEFBB for ; Mon, 4 Feb 2013 15:00:20 +0000 (UTC) Received: (qmail 83109 invoked by uid 500); 4 Feb 2013 15:00:20 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 83002 invoked by uid 500); 4 Feb 2013 15:00:19 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 82979 invoked by uid 99); 4 Feb 2013 15:00:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2013 15:00:19 +0000 Date: Mon, 4 Feb 2013 15:00:19 +0000 (UTC) From: "Arun C Murthy (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-371) Consolidate resource requests in AM-RM heartbeat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570314#comment-13570314 ] Arun C Murthy commented on YARN-371: ------------------------------------ {quote} Looks like there's a misunderstanding here - Sandy talks about reducing the memory requirements of the RM. If I understand the proposal correctly, the number of resource request objects sent by the AM in MR would be reduced from five (three node-local, one rack-local, one ANY) to one resource request with an array of locations (host names) of length five. {quote} Please read my explanation again. This change is *explicitly* against the design goals of YARN ResourceManager and would increase memory requirements of RM by a couple of orders of magnitude. Hadoop MR applications, routinely, have 100K+ tasks. The proposed change in this jira would require 100K+ resource-requests (one per task). Currently, in YARN, that can be expressed in O(nodes + racks + 1) resource-requests, which is ~O(5000) on even the largest clusters known today. So, in effect, this change would be a significant regression and result in 100,000 resource-requests v/s ~5000 needed today. bq. BTW Arun, immediately vetoing an issue in the first comment is not conducive to a balanced discussion! Tom - You can read it as a veto, or you can read it as *I strongly disagree since this is against the goals of the project and a significant regression*. IAC, we should allow for people's communication style... and keep discussions technical - I'd appreciate that. > Consolidate resource requests in AM-RM heartbeat > ------------------------------------------------ > > Key: YARN-371 > URL: https://issues.apache.org/jira/browse/YARN-371 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, resourcemanager, scheduler > Affects Versions: 2.0.2-alpha > Reporter: Sandy Ryza > Assignee: Sandy Ryza > > Each AMRM heartbeat consists of a list of resource requests. Currently, each resource request consists of a container count, a resource vector, and a location, which may be a node, a rack, or "*". When an application wishes to request a task run in multiple localtions, it must issue a request for each location. This means that for a node-local task, it must issue three requests, one at the node-level, one at the rack-level, and one with * (any). These requests are not linked with each other, so when a container is allocated for one of them, the RM has no way of knowing which others to get rid of. When a node-local container is allocated, this is handled by decrementing the number of requests on that node's rack and in *. But when the scheduler allocates a task with a node-local request on its rack, the request on the node is left there. This can cause delay-scheduling to try to assign a container on a node that nobody cares about anymore. > Additionally, unless I am missing something, the current model does not allow requests for containers only on a specific node or specific rack. While this is not a use case for MapReduce currently, it is conceivable that it might be something useful to support in the future, for example to schedule long-running services that persist state in a particular location, or for applications that generally care less about latency than data-locality. > Lastly, the ability to understand which requests are for the same task will possibly allow future schedulers to make more intelligent scheduling decisions, as well as permit a more exact understanding of request load. > I would propose the tweak of allowing a single ResourceRequest to encapsulate all the location information for a task. So instead of just a single location, a ResourceRequest would contain an array of locations, including nodes that it would be happy with, racks that it would be happy with, and possibly *. Side effects of this change would be a reduction in the amount of data that needs to be transferred in a heartbeat, as well in as the RM's memory footprint, becaused what used to be different requests for the same task are now able to share some common data. > While this change breaks compatibility, if it is going to happen, it makes sense to do it now, before YARN becomes beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira