Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C24CCEAF for ; Fri, 13 Sep 2013 15:57:27 +0000 (UTC) Received: (qmail 30225 invoked by uid 500); 13 Sep 2013 08:53:07 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 30111 invoked by uid 500); 13 Sep 2013 08:53:02 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 29892 invoked by uid 99); 13 Sep 2013 08:52:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Sep 2013 08:52:51 +0000 Date: Fri, 13 Sep 2013 08:52:51 +0000 (UTC) From: "Wangda Tan (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-1197) Add container merge support in YARN MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Wangda Tan created YARN-1197: -------------------------------- Summary: Add container merge support in YARN Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is, In some applications (like OpenMPI) has their own daemons in each node (one for each node) in their original implementation, and their user's processes are directly launched by its local daemon (like task-tracker in MRv1, but it's per-application). Many functionalities are depended on the pipes created when a process forked by its father, like IO-forwarding, process monitoring (it will do more logic than what NM did for us) and may cause some scalability issues. A very common resource request in MPI world is, "give me 100G memory in the cluster, I will launch 100 processes in this resource". In current YARN, we have following two choices to make this happen, 1) Send allocation request with 1G memory iteratively, until we got 100G memories in total. Then ask NM launch such 100 MPI processes. That will cause some problems like cannot support IO-forwarding, processes monitoring, etc. as mentioned above. 2) Send a larger resource request, like 10G. But we may encounter following problems, 2.1 Such a large resource request is hard to get at one time. 2.2 We cannot use other resources more than the number we specified in the node (we can only launch one daemon in one node). 2.3 Hard to decide how much resource to ask. So my proposal is, 1) We can incrementally send resource request with small resources like before, until we get enough resources in total 2) Merge resource in the same node, make only one big container in each node 3) Launch daemons in each node, and the daemon will spawn its local processes and manage them. For example, We need to run 10 processes, 1G for each, finally we got container 1, 2, 3, 4, 5 in node1. container 6, 7, 8 in node2. container 9, 10 in node3. Then we will, merge [1, 2, 3, 4, 5] to container_11 with 5G, launch a daemon, and the daemon will launch 5 processes merge [6, 7, 8] to container_12 with 3G, launch a daemon, and the daemon will launch 3 processes merge [9, 10] to container_13 with 2G, launch a daemon, and the daemon will launch 2 processes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira