Andrei,
The release candidate code does work. Perhaps something is different, relative to the patched
frankenstein I was using, perhaps I had some local corruption or config problem.
It sets up everything as whatever my local user is, by default, and the override as whirr.cluster-user
works as well.
In any case, at the rate AWS seems to be changing the configuration of 'amazon linux' perhaps
it's less useful than I thought. Last week the default amis in the console had a bunch of
spare disk space on the /media/ephemeral0 partition, which I could symlink /mnt to in the
install_cdh_hadoop.sh script, and then hdfs would have a decent amount of space. Now there
is no such thing, so I suppose I would have to launch an ebs volume per node and mount that.
This is now tipping over into the "too much trouble" zone for me. And in the mean time I
got all my native stuff (hadoop-lzo and R/Rhipe) working on ubuntu, so I think I'm going to
use the Alestic image from the recipe for a while. If there's an obvious candidate up there
for "reasonably-modern redhat derivative ami from a source on the good lists that behaves
well," I'd like to know what it is. By 'reasonably modern' I mean having default python >=
2.5.
I liked the old custom of having /mnt be a separate partition of a decent size. I hope this
is just a glitch with AWS. I suspect it may be because jclouds/whirr is showing (e.g.) in
the output:
volumes=[[id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false],
[id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, isBootDevice=false]
So theoretically the disk space is still there on those non-boot, non-durable devices, but
I cannot mount them.
I also tried the cluster ami, because I am intrigued by the possibilities for good performance.
Sounds great for hadoop, doesn't it? But it won't even start the nodes, giving this:
Configuring template
Unexpected error while starting 1 nodes, minimum 1 nodes for [hadoop-namenode, hadoop-jobtracker]
of cluster bhcLA
java.util.concurrent.ExecutionException: org.jclouds.http.HttpResponseException: command:
POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with response: HTTP/1.1 400 Bad
Request; content: [Non-Windows AMIs with a virtualization type of 'hvm' currently may only
be used with Cluster Compute instance types.]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.waitForOutcomes(BootstrapClusterAction.java:307)
at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:260)
at org.apache.whirr.cluster.actions.BootstrapClusterAction$StartupProcess.call(BootstrapClusterAction.java:221)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: org.jclouds.http.HttpResponseException: command: POST https://ec2.us-east-1.amazonaws.com/
HTTP/1.1 failed with response: HTTP/1.1 400 Bad Request; content: [Non-Windows AMIs with a
virtualization type of 'hvm' currently may only be used with Cluster Compute instance types.]
at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75)
There must be something a bit more involved to specify cluster instances in the amazon api,
perhaps not (yet) supported by jclouds? I'm afraid I don't need this enough right now to
justify digging further .
Anyway, thanks for all your help and advice on this.
--Ben
On Mar 17, 2011, at 7:01 PM, Andrei Savu wrote:
> Strange! I will try your properties file tomorrow.
>
> If you want to try again you can find the artifacts for 0.4.0 RC1 here:
> http://people.apache.org/~asavu/whirr-0.4.0-incubating-candidate-1
>
> On Thu, Mar 17, 2011 at 8:41 PM, Benjamin Clark <ben@daltonclark.com> wrote:
>> Andrei,
>>
>> Thanks for looking at this. Unfortunately it does not seem to work.
>>
>> Using the Amazon linux 64-bit ami with no whirr.cluster-user, or if I set it to 'ben'
or whatever else, I get this.
>>
>> 1) SshException on node us-east-1/i-62de280d:
>> org.jclouds.ssh.SshException: ec2-user@72.44.35.254:22: Error connecting to session.
>> at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>> at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>> at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>> at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>
>> So it doesn't seem to be honoring that property, and it's definitely not allowing
me to log in to any nodes, 'ben', 'ec2-user' or 'root'.
>>
>> The ubuntu ami from the recipes continues to work fine.
>>
>> Here's the full config file I'm using. I grabbed the recipe from trunk and put my
stuff back in, to make sure I'm not missing a new setting:
>>
>> whirr.cluster-name=bhcTL
>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,2 hadoop-datanode+hadoop-tasktracker
>> whirr.hadoop-install-function=install_cdh_hadoop
>> whirr.hadoop-configure-function=configure_cdh_hadoop
>> whirr.provider=aws-ec2
>> whirr.identity=${env:AWS_ACCESS_KEY_ID}
>> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-hkey
>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-hkey.pub
>> whirr.cluster-user=ben
>> # Amazon linux 32-bit--works
>> #whirr.hardware-id=c1.medium
>> #whirr.image-id=us-east-1/ami-d59d6bbc
>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/ -- works
>> #whirr.hardware-id=c1.xlarge
>> #whirr.image-id=us-east-1/ami-da0cf8b3
>> # Amazon linux 64-bit as of 3/11:--doesn't work
>> whirr.hardware-id=c1.xlarge
>> whirr.image-id=us-east-1/ami-8e1fece7
>> #Cluster compute --doesn't work
>> #whirr.hardward-id=cc1.4xlarge
>> #whirr.image-id=us-east-1/ami-321eed5b
>> whirr.location-id=us-east-1d
>> hadoop-hdfs.dfs.permissions=false
>> hadoop-hdfs.dfs.replication=2
>>
>>
>> --Ben
>>
>>
>>
>>
>> On Mar 17, 2011, at 1:08 PM, Andrei Savu wrote:
>>
>>> Ben, could you give it one more try using the current trunk?
>>>
>>> You can specify the user by setting the option whirr.cluster-user
>>> (defaults to current system user).
>>>
>>> On Wed, Mar 16, 2011 at 11:23 PM, Benjamin Clark <ben@daltonclark.com>
wrote:
>>>> Andrei,
>>>>
>>>> Thanks.
>>>>
>>>> After patching with 158, it launches fine as me on that Ubuntu image from
the recipe (i.e. on my client machine I am 'ben', so now the aws user that has sudo, and as
whom I can log in is also 'ben'), so that looks good.
>>>>
>>>> But it's now doing this with amazon linux (ami-da0cf8b3, which was the default
64-bit ami a few days ago, and may still be) during launch:
>>>>
>>>> 1) SshException on node us-east-1/i-b2678ddd:
>>>> org.jclouds.ssh.SshException: ben@50.16.96.211:22: Error connecting to session.
>>>> at org.jclouds.ssh.jsch.JschSshClient.propagate(JschSshClient.java:252)
>>>> at org.jclouds.ssh.jsch.JschSshClient.connect(JschSshClient.java:206)
>>>> at org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call(RunScriptOnNodeAsInitScriptUsingSsh.java:90)
>>>> at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:70)
>>>> at org.jclouds.compute.strategy.RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(RunScriptOnNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:45)
>>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>
>>>> So it seems as if the key part of jclouds authentication setup is still failing
for the amazon linux/ec2-user scenario, i.e. trying to set up as the local user, but failing.
>>>>
>>>> Is there a property for the user it launches as? Or does it just do whichever
user you are locally, instead of ec2-user/ubuntu/root, depending on the default, as before?
>>>>
>>>> I can switch to ubuntu, but I have a fair amount of native code setup in
my custom scripts and would prefer to stick with a redhattish version if possible.
>>>>
>>>> Looking ahead, I want to benchmark plain old 64-bit instances against cluster
instances, to see if the allegedly improved networking gives us a boost, and the available
ones I see are Suse and Amazon linux. When I switch to the amazon linux one, like so:
>>>>
>>>> whirr.hardward-id=cc1.4xlarge
>>>> whirr.image-id=us-east-1/ami-321eed5b
>>>>
>>>> I get different a different problem:
>>>>
>>>> Exception in thread "main" java.util.NoSuchElementException: hardwares don't
support any images: [biggest=false, fastest=false, imageName=null, imageDescription=Amazon
Linux AMI x86_64 HVM EBS EXT4, imageId=us-east-1/ami-321eed5b, imageVersion=ext4, location=[id=us-east-1,
scope=REGION, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA], metadata={}], minCores=0.0,
minRam=0, osFamily=unrecognized, osName=null, osDescription=amazon/amzn-hvm-ami-2011.02.1-beta.x86_64-ext4,
osVersion=, osArch=hvm, os64Bit=true, hardwareId=m1.small]
>>>> [[id=cc1.4xlarge, providerId=cc1.4xlarge, name=null, processors=[[cores=4.0,
speed=4.0], [cores=4.0, speed=4.0]], ram=23552, volumes=[[id=null, type=LOCAL, size=10.0,
device=/dev/sda1, durable=false, isBootDevice=true], [id=null, type=LOCAL, size=840.0, device=/dev/sdb,
durable=false, isBootDevice=false], [id=null, type=LOCAL, size=840.0, device=/dev/sdc, durable=false,
isBootDevice=false]], supportsI
>>>>
>>>> but I imagine that if using cluster instances is going to be possible, support
for amazon linux will be needed.
>>>>
>>>> --Ben
>>>>
>>>>
>>>> On Mar 16, 2011, at 4:07 PM, Andrei Savu wrote:
>>>>
>>>>> I've seen something similar while testing Whirr: WHIRR-264 [0]. We are
>>>>> going to commit WHIRR-158 [1] tomorrow and it should fix the problem
>>>>> you are seeing. We should be able to restart the vote for the 0.4.0
>>>>> release after fixing this issue.
>>>>>
>>>>> [0] https://issues.apache.org/jira/browse/WHIRR-264
>>>>> [1] https://issues.apache.org/jira/browse/WHIRR-158
>>>>>
>>>>> -- Andrei Savu / andreisavu.ro
>>>>>
>>>>> On Wed, Mar 16, 2011 at 6:54 PM, Benjamin Clark <ben@daltonclark.com>
wrote:
>>>>>> I have been using whirr 0.4 branch to launch clusters of c1.medium
amazon linux machines (whirr.image-id=us-east-1/ami-d59d6bbc, which was the default for new
amazon linux instances, a few days ago) with good success. I took the default hadoop-ec2.properties
recipe and modified it slightly to suit my needs. I'm now trying with basically the same
properties file, but when I use
>>>>>>
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>>
>>>>>> and then either this (from the recipe)
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>>
>>>>>> or this:
>>>>>> # Amazon linux 64-bit, default as of 3/11:
>>>>>> whirr.image-id=us-east-1/ami-8e1fece7
>>>>>>
>>>>>> I get a a failure to install the right public key, so that I can't
log into the name node (or any other nodes, for that matter).
>>>>>>
>>>>>>
>>>>>> My whole config file is this:
>>>>>>
>>>>>> whirr.cluster-name=bhcL4
>>>>>> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,4 hadoop-datanode+hadoop-tasktracker
>>>>>> whirr.hadoop-install-function=install_cdh_hadoop
>>>>>> whirr.hadoop-configure-function=configure_cdh_hadoop
>>>>>> whirr.provider=aws-ec2
>>>>>> whirr.identity=...
>>>>>> whirr.credential=...
>>>>>> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop
>>>>>> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa-formyhadoop.pub
>>>>>> whirr.hardware-id=c1.xlarge
>>>>>> #whirr.hardware-id=c1.medium
>>>>>> # Ubuntu 10.04 LTS Lucid. See http://alestic.com/
>>>>>> whirr.image-id=us-east-1/ami-da0cf8b3
>>>>>> # Amazon linux as of 3/11:
>>>>>> #whirr.image-id=us-east-1/ami-8e1fece7
>>>>>> # If you choose a different location, make sure whirr.image-id is
updated too
>>>>>> whirr.location-id=us-east-1d
>>>>>> hadoop-hdfs.dfs.permissions=false
>>>>>> hadoop-hdfs.dfs.replication=2
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am I doing something wrong here? I tried with whirr.location-id=us-east-1d
and whirr.location-id=us-east-1
>>>>
>>>>
>>
>>
|