Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 34266F78B for ; Thu, 11 Apr 2013 13:07:55 +0000 (UTC) Received: (qmail 69540 invoked by uid 500); 11 Apr 2013 12:57:34 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 69064 invoked by uid 500); 11 Apr 2013 12:57:16 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 68876 invoked by uid 99); 11 Apr 2013 12:57:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Apr 2013 12:57:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Apr 2013 12:56:58 +0000 Received: by mail-ie0-f177.google.com with SMTP id 9so1573196iec.8 for ; Thu, 11 Apr 2013 05:56:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:content-transfer-encoding :x-gm-message-state; bh=UWaQnNhswnbsn1qT3h9MjEx9piyCHhkdcaWsiaLLNSw=; b=jdn3QzFI0fxRiJJW6RAGpKhkOEGvwpCAQOBOvekrs0pJJLQPwKmpagh7dKxxKp6buU KCSCaQmSy+uPEANj/c+ypB1nNLvWR8IjuaTLW2zOVr1xaxbrcq4szsLRpz3zbJTLn1s/ NvutEUY389kRUzm0mcJGOS93n/L+iEP7pFyK78Ixky7eXe4pPBnGb8zznunv57+/H792 vVYBjdaqy0TfMB7siMgYJ89AGc3xO2pejnCI+L7POpEveksW1F8Yovl5BXdNrW6EpQPW SEiDZ0qFB+HfoSs0XLpnmWlUXCP3Ne3HeqM1hF6fW790XKlp5CZ0D4B+3FtVH/Tw2iiC 9FdA== X-Received: by 10.50.10.161 with SMTP id j1mr4408074igb.45.1365684997355; Thu, 11 Apr 2013 05:56:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.135.37 with HTTP; Thu, 11 Apr 2013 05:56:17 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Thu, 11 Apr 2013 18:26:17 +0530 Message-ID: Subject: Re: communication path for task assignment in hadoop 2.X To: "" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlk6+zPs9BBQUXKveo3zIv6pGT/2SeeZNtJ7qSKpnuAEEJj+DvpxEDVpB9vhO+CEnIJ0Fw8 X-Virus-Checked: Checked by ClamAV on apache.org Hi Hari, My response inline. On Thu, Apr 11, 2013 at 11:51 AM, hari wrote: > Hi list, > > I was looking at the node communication (in hadoop-2.0.3-alpha) > responsible for task assignment. So > far I saw that the resourcemanager and nodemanager > exchange heartbeat request/response. Very glad you're checking out YARN. Do provide feedback on what we can improve over the YARN JIRA when you get to try it! > I have a couple of questions on this: > > 1. I saw that the heartbeat response to nodemanager does > not include new task assignment. I think in the previous > versions (eg. 0.20.2) the task assignment was piggybacked in the > heartbeat response to the tasktracker. In the v2.0.3, so far > I could only see that the response can trigger nodemanager > shutdown, reboot, app cleanup, and container cleanup. Are there > any other actions being triggered on the nodemanager > by the heartbeat response ? Note: New term instead of 'Task', in YARN, is 'Container'. Yes, the heartbeats between the NodeManager and the ResourceManager does not account for container assignments anymore. Container launches are handled by a separate NodeManager-embedded service called the ContainerManager [1]. You can read all the functions a heartbeat currently does at a NodeManager under its service NodeStatusUpdater code at [2]. > 2. Is there a different communication path for task assignment ? > Is the scheduler making the remote calls or are there other classes outsi= de > of yarn responsible for making the remote calls ? The latter. Scheduler no longer is responsible for asking NodeManagers to launch the containers. An ApplicationMaster just asks Scheduler to reserve it some containers with specified resource (Memory/CPU/etc.) allocations on available or preferred NodeManagers, and then once the ApplicationMaster gets a response back that the allocation succeeded, it manually communicates with ContainerManager on the allocated NodeManager, to launch the command needed to run the 'task'. A good-enough example to read would be the DistributedShell example. I've linked [3] to show where the above AppMaster -> ContainerManager requesting happens in its ApplicationMaster implementation, which should help clear this for you. [1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-projec= t/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/ja= va/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerMana= gerImpl.java#L392 [startContainer, stopContainer, getContainerStatus etc=85 are all protocol-callable calls from an Application Master] [2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-projec= t/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/ja= va/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java#L43= 3 [Link is to the heartbeat retry loop, and then the response processing follows right after the loop] [3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-projec= t/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributed= shell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Ap= plicationMaster.java#L756 [See the whole method, i.e. above from this code line point which is just the final RPC call, and you can notice how we build the command/environment to launch our 'task'] -- Harsh J