whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Clark <...@daltonclark.com>
Subject Re: aws 64-bit c1.xlarge problems
Date Fri, 18 Mar 2011 05:16:02 GMT
Andrei,

The release candidate code does work.  Perhaps something is different, relative to the patched
frankenstein I was using, perhaps I had some local corruption or config problem.

It sets up everything as whatever my local user is, by default, and the override as whirr.cluster-user
works as well.

In any case, at the rate AWS seems to be changing the configuration of 'amazon linux' perhaps
it's less useful than I thought.  Last week the default amis in the console had a bunch of
spare disk space on the /media/ephemeral0 partition, which I could symlink /mnt to in the
install_cdh_hadoop.sh script, and then hdfs would have a decent amount of space.  Now there
is no such thing, so I suppose I would have to launch an ebs volume per node and mount that.
 This is now tipping over into the "too much trouble" zone for me.  And in the mean time I
got all my native stuff (hadoop-lzo and R/Rhipe) working on ubuntu, so I think I'm going to
use the Alestic image from the recipe for a while.  If there's an obvious candidate up there
for "reasonably-modern redhat derivative ami from a source on the good lists that behaves
well," I'd like to know what it is.  By 'reasonably modern' I mean having default python >=
2.5.

I liked the old custom of having /mnt be a separate partition of a decent size.  I hope this
is just a glitch with AWS.  I suspect it may be because jclouds/whirr is showing (e.g.) in
the output:
volumes=[[id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false],
[id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]
So theoretically the disk space is still there on those non-boot, non-durable devices, but
I cannot mount them.  


I also tried the cluster ami, because I am intrigued by the possibilities for good performance.
 Sounds great for hadoop, doesn't it?  But it won't even start the nodes, giving this:

Configuring template
Unexpected error while starting 1 nodes, minimum 1 nodes for [hadoop-namenode, hadoop-jobtracker]
of cluster bhcLA
java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command:
POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad
Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only
be used with Cluster Compute instance types.]
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.waitForOutcomes(BootstrapClusterAction.java:307)
	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:260)
	at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:221)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)
Caused by: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/
HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a
virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.]
	at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75)

There must be something a bit more involved to specify cluster instances in the amazon api,
perhaps not (yet) supported by jclouds?  I'm afraid I don't need this enough right now to
justify digging further .


Anyway, thanks for all your help and advice on this.

--Ben


On Mar 17, 2011, at 7:01 PM, Andrei Savu wrote:

> Strange! I will try your properties file tomorrow.
> 
> If you want to try again you can find the artifacts for 0.4.0 RC1 here:
> http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1
> 
> On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <ben@daltonclark.com> wrote:
>> Andrei,
>> 
>> Thanks for looking at this.  Unfortunately it does not seem to work.
>> 
>> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it to 'ben'
or whatever else, I get this.
>> 
>> 1) SshException on node us-east-1/i-62de280d:
>> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to session.
>>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>> 
>> So it doesn't seem to be honoring that property, and it's definitely not allowing
me to log in to any nodes,  'ben', 'ec2-user' or 'root'.
>> 
>> The ubuntu ami from the recipes continues to work fine.
>> 
>> Here's the full config file I'm using.  I grabbed the recipe from trunk and put my
stuff back in, to make sure I'm not missing a new setting:
>> 
>> whirr.cluster-name=bhcTL
>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 hadoop-datanode+hadoop-tasktracker
>> whirr.hadoop-install-function=install_cdh_hadoop
>> whirr.hadoop-configure-function=configure_cdh_hadoop
>> whirr.provider=aws-ec2
>> whirr.identity=${env:AWS_ACCESS_KEY_ID}
>> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
>> whirr.cluster-user=ben
>> # Amazon linux 32-bit--works
>> #whirr.hardware-id=c1.medium
>> #whirr.image-id=us-east-1/ami-d59d6bbc
>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
>> #whirr.hardware-id=c1.xlarge
>> #whirr.image-id=us-east-1/ami-da0cf8b3
>> # Amazon linux 64-bit as of 3/11:--doesn't work
>> whirr.hardware-id=c1.xlarge
>> whirr.image-id=us-east-1/ami-8e1fece7
>> #Cluster compute --doesn't work
>> #whirr.hardward-id=cc1.4xlarge
>> #whirr.image-id=us-east-1/ami-321eed5b
>> whirr.location-id=us-east-1d
>> hadoop-hdfs.dfs.permissions=false
>> hadoop-hdfs.dfs.replication=2
>> 
>> 
>> --Ben
>> 
>> 
>> 
>> 
>> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:
>> 
>>> Ben,  could you give it one more try using the current trunk?
>>> 
>>> You can specify the user by setting the option whirr.cluster-user
>>> (defaults to current system user).
>>> 
>>> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <ben@daltonclark.com>
wrote:
>>>> Andrei,
>>>> 
>>>> Thanks.
>>>> 
>>>> After patching with 158, it launches fine as me on that Ubuntu image from
the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as
whom I can log in is also 'ben'), so that looks good.
>>>> 
>>>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the default
64-bit ami a few days ago, and may still be) during launch:
>>>> 
>>>> 1) SshException on node us-east-1/i-b2678ddd:
>>>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
>>>>        at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>>>        at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>>>        at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>>>        at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>>>>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>> 
>>>> So it seems as if the key part of jclouds authentication setup is still failing
for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.
>>>> 
>>>> Is there a property for the user it launches as?  Or does it just do whichever
user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?
>>>> 
>>>> I can switch to ubuntu, but I have a fair amount of native code setup in
my custom scripts and would prefer to stick with a redhattish version if possible.
>>>> 
>>>> Looking ahead, I want to benchmark plain old 64-bit instances against cluster
instances, to see if the allegedly improved networking gives us a boost, and the available
ones I see are Suse and Amazon linux.  When I switch to the amazon linux one, like so:
>>>> 
>>>> whirr.hardward-id=cc1.4xlarge
>>>> whirr.image-id=us-east-1/ami-321eed5b
>>>> 
>>>> I get different a different problem:
>>>> 
>>>> Exception in thread "main" java.util.NoSuchElementException: hardwares don't
support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon
Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1,
scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0,
minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4,
osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
>>>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0,
speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0,
device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb,
durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsI
>>>> 
>>>> but I imagine that if using cluster instances is going to be possible, support
for amazon linux will be needed.
>>>> 
>>>> --Ben
>>>> 
>>>> 
>>>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>>>> 
>>>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>>>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>>>>> you are seeing. We should be able to restart the vote for the 0.4.0
>>>>> release after fixing this issue.
>>>>> 
>>>>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>>>>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>>>> 
>>>>> -- Andrei Savu / andreisavu.ro
>>>>> 
>>>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <ben@daltonclark.com>
wrote:
>>>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium
amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new
amazon linux instances, a few days ago) with good success.  I took the default hadoop-ec2.properties
recipe and modified it slightly to suit my needs.  I'm now trying with basically the same
properties file, but when I use
>>>>>> 
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>> 
>>>>>> and then either this (from the recipe)
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>> 
>>>>>> or this:
>>>>>> # Amazon linux 64-bit, default as of 3/11:
>>>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>>>> 
>>>>>> I get a a failure to install the right public key, so that I can't
log into the name node (or any other nodes, for that matter).
>>>>>> 
>>>>>> 
>>>>>> My whole config file is this:
>>>>>> 
>>>>>> whirr.cluster-name=bhcL4
>>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>>>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>>>> whirr.provider=aws-ec2
>>>>>> whirr.identity=...
>>>>>> whirr.credential=...
>>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>> #whirr.hardware-id=c1.medium
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>> # Amazon linux as of 3/11:
>>>>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>>>>> # If you choose a different location, make sure whirr.image-id is
updated too
>>>>>> whirr.location-id=us-east-1d
>>>>>> hadoop-hdfs.dfs.permissions=false
>>>>>> hadoop-hdfs.dfs.replication=2
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Am I doing something wrong here?  I tried with whirr.location-id=us-east-1d
and whirr.location-id=us-east-1
>>>> 
>>>> 
>> 
>> 


Mime
View raw message