incubator-whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selwyn McCracken <selwyn.mccrac...@gmail.com>
Subject Re: some nodes terminating at startup
Date Thu, 10 Mar 2011 08:25:15 GMT
Thanks Tom.

Will build from the trunk tonight and give it a test (it does appear
to be the same issue as WHIRR-167).

The script hangs on the launch machine. I launched some smaller
clusters, so hopefully this is the relevant section of the log
displayed when I had to use Ctrl-Z to recover control of the terminal
so I could destroy the cluster.

--
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
dpkg-preconfigure: unable to re-open stdin:

2011-03-07 20:23:31,518 DEBUG [jclouds.compute] (user thread 11) <<
options applied node(us-east-1/i-851d14e9)
2011-03-07 20:23:31,524 INFO
[org.apache.whirr.cluster.actions.NodeStarter] (pool-1-thread-2) Nodes
started: [[id=us-east-1/i-8b1d14e7, providerId=i-8b1d14e7,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.114.121.62],
publicAddresses=[184.73.9.122], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}], [id=us-east-1/i-891d14e5, providerId=i-891d14e5,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.114.206.253],
publicAddresses=[72.44.38.144], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}], [id=us-east-1/i-871d14eb, providerId=i-871d14eb,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.114.74.91],
publicAddresses=[50.16.96.184], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}], [id=us-east-1/i-8d1d14e1, providerId=i-8d1d14e1,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.212.167.31],
publicAddresses=[174.129.88.235], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}], [id=us-east-1/i-b11d14dd, providerId=i-b11d14dd,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.116.149.144],
publicAddresses=[174.129.74.156], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}], [id=us-east-1/i-8f1d14e3, providerId=i-8f1d14e3,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.114.251.250],
publicAddresses=[67.202.41.42], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}], [id=us-east-1/i-b31d14df, providerId=i-b31d14df,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.116.222.97],
publicAddresses=[75.101.229.142], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}], [id=us-east-1/i-851d14e9, providerId=i-851d14e9,
group=hadoop8l, name=null, location=[id=us-east-1b, scope=ZONE,
description=us-east-1b, parent=us-east-1, iso3166Codes=[US-VA],
metadata={}], uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null,
family=ubuntu, version=10.04, arch=paravirtual, is64Bit=true,
description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml],
state=RUNNING, loginPort=22, privateAddresses=[10.116.222.165],
publicAddresses=[50.16.23.148], hardware=[id=m1.large,
providerId=m1.large, name=null, processors=[[cores=2.0, speed=2.0]],
ram=7680, volumes=[[id=null, type=LOCAL, size=10.0, device=/dev/sda1,
durable=false, isBootDevice=true], [id=null, type=LOCAL, size=420.0,
device=/dev/sdb, durable=false, isBootDevice=false], [id=null,
type=LOCAL, size=420.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsImage=is64Bit()], loginUser=ubuntu,
userMetadata={}]]

On Tue, Mar 8, 2011 at 11:39 PM, Tom White <tom.e.white@gmail.com> wrote:
> Hi Selwyn,
>
> https://issues.apache.org/jira/browse/WHIRR-167 should improve
> reliability of larger clusters, but it isn't in a released version yet
> (it's in 0.4.0). You might try building trunk to see if it helps you.
>
> Where does the script hang? On the cloud instance or on the launch
> machine? What's the last thing in the log?
>
> Adding nodes to a running cluster is still under development
> (https://issues.apache.org/jira/browse/WHIRR-214).
>
> Cheers,
> Tom
>
> On Mon, Mar 7, 2011 at 1:51 PM, Selwyn McCracken
> <selwyn.mccracken@gmail.com> wrote:
>> Hi Whirrers,
>>
>> I have been successfully launching smaller clusters with whirr (<= 4
>> data nodes).
>>
>> When I try to scale to something larger (8+ nodes), some of the nodes
>> terminate during the startup process, and frequently it is the name
>> node.
>>
>> I have reviewed the logs and there doesn't to be anything I can spot
>> (in fact the whirr script hangs and never closes, so the log never
>> completes).
>>
>> I suspect something is timing out if the cluster is being launched serially...
>>
>> Has there been any progress made in adding nodes to an already running
>> cluster? This might help to work around this problem, and make it
>> easier for my benchmarking tests, where I am trying to show a linear
>> decrease in processing time as the number of nodes increase. That is,
>> I wont have to start a fresh cluster and reload the data into HDFS for
>> each test run.
>>
>> Anyway, here is the recipe I have been using:
>>
>> whirr.cluster-name=hadoop8l
>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,8
>> hadoop-datanode+hadoop-tasktracker
>> whirr.hadoop-install-function=install_cdh_hadoop
>> whirr.hadoop-configure-function=configure_cdh_hadoop
>> whirr.provider=aws-ec2
>> whirr.identity=${env:AWS_ACCESS_KEY_ID}
>> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
>> whirr.hardware-id=m1.large
>> whirr.image-id=us-east-1/ami-da0cf8b3
>> whirr.location-id=us-east-1
>>
>> Any help greatly appreciated.
>> Selwyn
>>
>

Mime
View raw message