Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB8FF10210 for ; Wed, 11 Dec 2013 10:16:15 +0000 (UTC) Received: (qmail 99595 invoked by uid 500); 11 Dec 2013 10:16:14 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 99552 invoked by uid 500); 11 Dec 2013 10:16:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 99514 invoked by uid 99); 11 Dec 2013 10:16:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 10:16:12 +0000 Date: Wed, 11 Dec 2013 10:16:12 +0000 (UTC) From: "Sandy Ryza (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845261#comment-13845261 ] Sandy Ryza commented on YARN-1404: ---------------------------------- bq. Other than saying you don't want to wait for impala-under-YARN integration, I haven't heard any technical reservations against this approach. I have no technical reservations with the overall approach. In fact I'm in favor of it. My points are: * We will not see this happen for a while and that the original approach on this JIRA supports a workaround that has no consequences for clusters not running Impala on YARN. * I'm sure many that would love to take advantage of centrally resource-managed HDFS caching will be unwilling to deploy HDFS through YARN. This will go for all sorts of legacy applications as well. If, beside the changes Arun proposed, we can expose YARN's central scheduling independent from its deployment/enforcement, there would be a lot to gain. If this is within easy reach, I don't find arguments that YARN is philosophically opposed to it or that the additional freedom would allow cluster-configurers to shoot themselves in the foot satisfying. I realize that we are rehashing many of the same arguments so I'm not sure how to make progress on this. I'll wait until Tucu returns from vacation to push further. > Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling > ----------------------------------------------------------------------------------------------------- > > Key: YARN-1404 > URL: https://issues.apache.org/jira/browse/YARN-1404 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Affects Versions: 2.2.0 > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Attachments: YARN-1404.patch > > > Currently Hadoop Yarn expects to manage the lifecycle of the processes its applications run workload in. External frameworks/systems could benefit from sharing resources with other Yarn applications while running their workload within long-running processes owned by the external framework (in other words, running their workload outside of the context of a Yarn container process). > Because Yarn provides robust and scalable resource management, it is desirable for some external systems to leverage the resource governance capabilities of Yarn (queues, capacities, scheduling, access control) while supplying their own resource enforcement. > Impala is an example of such system. Impala uses Llama (http://cloudera.github.io/llama/) to request resources from Yarn. > Impala runs an impalad process in every node of the cluster, when a user submits a query, the processing is broken into 'query fragments' which are run in multiple impalad processes leveraging data locality (similar to Map-Reduce Mappers processing a collocated HDFS block of input data). > The execution of a 'query fragment' requires an amount of CPU and memory in the impalad. As the impalad shares the host with other services (HDFS DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks). > To ensure cluster utilization that follow the Yarn scheduler policies and it does not overload the cluster nodes, before running a 'query fragment' in a node, Impala requests the required amount of CPU and memory from Yarn. Once the requested CPU and memory has been allocated, Impala starts running the 'query fragment' taking care that the 'query fragment' does not use more resources than the ones that have been allocated. Memory is book kept per 'query fragment' and the threads used for the processing of the 'query fragment' are placed under a cgroup to contain CPU utilization. > Today, for all resources that have been asked to Yarn RM, a (container) process must be started via the corresponding NodeManager. Failing to do this, will result on the cancelation of the container allocation relinquishing the acquired resource capacity back to the pool of available resources. To avoid this, Impala starts a dummy container process doing 'sleep 10y'. > Using a dummy container process has its drawbacks: > * the dummy container process is in a cgroup with a given number of CPU shares that are not used and Impala is re-issuing those CPU shares to another cgroup for the thread running the 'query fragment'. The cgroup CPU enforcement works correctly because of the CPU controller implementation (but the formal specified behavior is actually undefined). > * Impala may ask for CPU and memory independent of each other. Some requests may be only memory with no CPU or viceversa. Because a container requires a process, complete absence of memory or CPU is not possible even if the dummy process is 'sleep', a minimal amount of memory and CPU is required for the dummy process. > Because of this it is desirable to be able to have a container without a backing process. -- This message was sent by Atlassian JIRA (v6.1.4#6159)