Return-Path: Delivered-To: apmail-incubator-whirr-user-archive@minotaur.apache.org Received: (qmail 31883 invoked from network); 10 Mar 2011 08:25:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Mar 2011 08:25:43 -0000 Received: (qmail 22252 invoked by uid 500); 10 Mar 2011 08:25:43 -0000 Delivered-To: apmail-incubator-whirr-user-archive@incubator.apache.org Received: (qmail 22230 invoked by uid 500); 10 Mar 2011 08:25:43 -0000 Mailing-List: contact whirr-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: whirr-user@incubator.apache.org Delivered-To: mailing list whirr-user@incubator.apache.org Received: (qmail 22222 invoked by uid 99); 10 Mar 2011 08:25:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Mar 2011 08:25:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of selwyn.mccracken@gmail.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qy0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Mar 2011 08:25:36 +0000 Received: by qyk35 with SMTP id 35so4740595qyk.6 for ; Thu, 10 Mar 2011 00:25:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=1wZJZ6SPQTpZ2QLLadz8UFwPvFmeBpHOW3cZwl7BVK0=; b=S/t9B7ehNEgtWGhCsmFUdUsZmutp2UiZuocimhDH7k7aAxuqoJgB+HfoCapoHdnHR5 RhoC6Rc1oIZb8OXLhPXJWW//etQIRMPN4MGaVXmwCRsvvCPfMktMDJiUEcNFReMHoRHh NRuVu4XrfWt9oT+pcq4b2lWjxxtzeTcGvZ980= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=iGilJ0+tMO4ffFIbKvkQ49x/rhghomC4Nf8M7CkHnF0EpqjswsL7pMuuTO5JNjFJga rtlEToFDylJnHTVCfyDTyaNGez4BaCaaysGgHCAFfNwbW8seCRXM0C3+KJ5Y6QmKiv3S VetDFE/hkYhdGEIbVQ7yQMHsl0T9ny+L1Ns3I= MIME-Version: 1.0 Received: by 10.224.210.68 with SMTP id gj4mr6687830qab.370.1299745515672; Thu, 10 Mar 2011 00:25:15 -0800 (PST) Received: by 10.229.240.208 with HTTP; Thu, 10 Mar 2011 00:25:15 -0800 (PST) In-Reply-To: References: Date: Thu, 10 Mar 2011 08:25:15 +0000 Message-ID: Subject: Re: some nodes terminating at startup From: Selwyn McCracken To: Tom White Cc: whirr-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Thanks Tom. Will build from the trunk tonight and give it a test (it does appear to be the same issue as WHIRR-167). The script hangs on the launch machine. I launched some smaller clusters, so hopefully this is the relevant section of the log displayed when I had to use Ctrl-Z to recover control of the terminal so I could destroy the cluster. -- Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) dpkg-preconfigure: unable to re-open stdin: 2011-03-07 20:23:31,518 DEBUG [jclouds.compute] (user thread 11) << options applied node(us-east-1/i-851d14e9) 2011-03-07 20:23:31,524 INFO [org.apache.whirr.cluster.actions.NodeStarter] (pool-1-thread-2) Nodes started: [[id=us-east-1/i-8b1d14e7, providerId=i-8b1d14e7, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.114.121.62], publicAddresses=[184.73.9.122], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}], [id=us-east-1/i-891d14e5, providerId=i-891d14e5, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.114.206.253], publicAddresses=[72.44.38.144], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}], [id=us-east-1/i-871d14eb, providerId=i-871d14eb, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.114.74.91], publicAddresses=[50.16.96.184], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}], [id=us-east-1/i-8d1d14e1, providerId=i-8d1d14e1, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.212.167.31], publicAddresses=[174.129.88.235], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}], [id=us-east-1/i-b11d14dd, providerId=i-b11d14dd, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.116.149.144], publicAddresses=[174.129.74.156], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}], [id=us-east-1/i-8f1d14e3, providerId=i-8f1d14e3, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.114.251.250], publicAddresses=[67.202.41.42], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}], [id=us-east-1/i-b31d14df, providerId=i-b31d14df, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.116.222.97], publicAddresses=[75.101.229.142], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}], [id=us-east-1/i-851d14e9, providerId=i-851d14e9, group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE, description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true, description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], state=RUNNING, loginPort=22, privateAddresses=[10.116.222.165], publicAddresses=[50.16.23.148], hardware=[id=m1.large, providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu, userMetadata={}]] On Tue, Mar 8, 2011 at 11:39 PM, Tom White wrote: > Hi Selwyn, > > https://issues.apache.org/jira/browse/WHIRR-167 should improve > reliability of larger clusters, but it isn't in a released version yet > (it's in 0.4.0). You might try building trunk to see if it helps you. > > Where does the script hang? On the cloud instance or on the launch > machine? What's the last thing in the log? > > Adding nodes to a running cluster is still under development > (https://issues.apache.org/jira/browse/WHIRR-214). > > Cheers, > Tom > > On Mon, Mar 7, 2011 at 1:51 PM, Selwyn McCracken > wrote: >> Hi Whirrers, >> >> I have been successfully launching smaller clusters with whirr (<= 4 >> data nodes). >> >> When I try to scale to something larger (8+ nodes), some of the nodes >> terminate during the startup process, and frequently it is the name >> node. >> >> I have reviewed the logs and there doesn't to be anything I can spot >> (in fact the whirr script hangs and never closes, so the log never >> completes). >> >> I suspect something is timing out if the cluster is being launched serially... >> >> Has there been any progress made in adding nodes to an already running >> cluster? This might help to work around this problem, and make it >> easier for my benchmarking tests, where I am trying to show a linear >> decrease in processing time as the number of nodes increase. That is, >> I wont have to start a fresh cluster and reload the data into HDFS for >> each test run. >> >> Anyway, here is the recipe I have been using: >> >> whirr.cluster-name=hadoop8l >> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,8 >> hadoop-datanode+hadoop-tasktracker >> whirr.hadoop-install-function=install_cdh_hadoop >> whirr.hadoop-configure-function=configure_cdh_hadoop >> whirr.provider=aws-ec2 >> whirr.identity=${env:AWS_ACCESS_KEY_ID} >> whirr.credential=${env:AWS_SECRET_ACCESS_KEY} >> whirr.hardware-id=m1.large >> whirr.image-id=us-east-1/ami-da0cf8b3 >> whirr.location-id=us-east-1 >> >> Any help greatly appreciated. >> Selwyn >> >