Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 55523 invoked from network); 11 Apr 2008 16:49:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Apr 2008 16:49:10 -0000 Received: (qmail 61564 invoked by uid 500); 11 Apr 2008 16:49:08 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 61543 invoked by uid 500); 11 Apr 2008 16:49:08 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 61534 invoked by uid 99); 11 Apr 2008 16:49:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Apr 2008 09:49:08 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Apr 2008 16:48:23 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 17E73234C0C8 for ; Fri, 11 Apr 2008 09:46:05 -0700 (PDT) Message-ID: <1827158566.1207932365096.JavaMail.jira@brutus> Date: Fri, 11 Apr 2008 09:46:05 -0700 (PDT) From: "Nate Carlson (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2410) Make EC2 cluster nodes more independent of each other MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588022#action_12588022 ] Nate Carlson commented on HADOOP-2410: -------------------------------------- One change I've had to made is to add the memory option for the processes the cluster launches in the hadoop-site.xml that gets generated.. this would probably be a good thing to make configurable for the end user. I also have manually installed nph-proxy on the master node, with http authentication -- makes it much easier to get around the slave nodes. > Make EC2 cluster nodes more independent of each other > ----------------------------------------------------- > > Key: HADOOP-2410 > URL: https://issues.apache.org/jira/browse/HADOOP-2410 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/ec2 > Affects Versions: 0.16.1 > Reporter: Tom White > Assignee: Chris K Wensel > Fix For: 0.17.0 > > Attachments: concurrent-clusters-2.patch, concurrent-clusters-3.patch, concurrent-clusters.patch, ec2.tgz > > > The cluster start up scripts currently wait for each node to start up before appointing a master (to run the namenode and jobtracker on), and copying private keys to all the nodes, and writing the private IP address of the master to the hadoop-site.xml file (which is then copied to the slaves via rsync). Only once this is all done is hadoop started on the cluster (from the master). This can fail if any of the nodes fails to come up, which can happen as EC2 doesn't guarantee that you get a cluster of the size you ask for (I've seen this happen). > The process would be more robust if each node was told the address of the master as user metadata and then started its own daemons. This is complicated by the fact that the public DNS alias of the master resolves to a public IP address so cannot be used by EC2 nodes (see http://docs.amazonwebservices.com/AWSEC2/2007-08-29/DeveloperGuide/instance-addressing.html). Instead we need to use a trick (http://developer.amazonwebservices.com/connect/message.jspa?messageID=71126#71126) to find the private IP, and what's more we need to attempt to resolve the private IP in a loop until it is available since the DNS will only be set up after the master has started. > This change will also mean the private key doesn't need to be copied to each node, which can be slow and has dubious security. Configuration can be handled using the mechanism described in HADOOP-2409. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.