hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Watt <sw...@us.ibm.com>
Subject Re: Getting Hadoop Working on EC2/S3
Date Tue, 30 Sep 2008 20:15:29 GMT
I think we've identified a bug with the create-image parameter on the ec2 
scripts under src/contrib

This was my workaround. 

1) Start a single instance of the Hadoop AMI you want to modify using the 
ElasticFox firefox plugin (or the ec2-tools)
2) Modify the /root/hadoop-init script and change the fs.default.name 
property to point to the FULL s3 path to your bucket (after doing this 
make sure you do not make your image public!)
3) Follow the instructions at 
for bundling, uploading and registering your new AMI.
4) On your local machine, in the hadoop-ec2-env.sh file, change the 
S3_BUCKET to point to your private s3 bucket where you uploaded your new 
image.  Change the HADOOP_VERSION to your new AMI name.

You can now go to your cmd prompt and say "bin/hadoop-ec2 launch-cluster 
myClusterName 5"  and it will bring up 5 instances in a hadoop cluster all 
running off your S3 Bucket instead of HDFS.

Kind regards

Steve Watt
IBM Certified IT Architect
Open Group Certified Master IT Architect

Tel: (512) 286 - 9170
Tie: 363 - 9170
Emerging Technologies, Austin, TX
IBM Software Group

"Alexander Aristov" <alexander.aristov@gmail.com>
09/30/2008 01:24 AM
Re: Getting Hadoop Working on EC2/S3

Does you AWS (S3) key contain the "?" sing ? If so then it can be a cause.
Regenerate the key in this case.

I have also tried to use the create-image command but I stopped all 
after constant failures, It was easier to make AMI by hands.


2008/9/29 Stephen Watt <swatt@us.ibm.com>

> Hi Folks
> Before I get started, I just want to state that I've done the due
> diligence and read Tom White's articles as well as EC2 and S3 pages on 
> Hadoop Wiki and done some searching on this.
> Thus far I have successfully got Hadoop running on EC2 with no problems.
> In my local hadoop 0.18 environment I simply add my AWS keys to the
> hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch
> cluster script and it works great.
> Now, I'm trying to use the Public Haodop EC2 images to run over S3 
> of HDFS. They are set up to use variables passed in from a parameterized
> launch for all the config options everything EXCEPT the
> fs.default.filesystem. So in order to bring a cluster of 20 hadoop
> instances up that run over S3, I need to mod the config file to point to
> my S3 bucket for the fs.default.filesystem and keep the rest the same.
> Thus I need my own image to do this.  I am attempting this by using the
> local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried 
> both on a windows system (cygwin environment) AND on my ubuntu 8 system
> and with each one it gets all the way to the end and fails as it 
> to save the new image to my bucket and says the bucket does not exist 
> a Server.NoSuchBucket (404) error.
> The S3 bucket definitely does exist. I have block data inside of it that
> are results of my Hadoop Jobs. I can go to a single hadoop image on EC2
> that I've launched and manually set up to use S3 and say bin/hadoop dfs
> -ls / and I can see the contents of my S3 bucket. I can also succesfully
> use that s3 bucket as an input and output of my jobs for a single EC2
> hadoop instance. I've tried creating new buckets using the FireFox S3
> Organizer plugin and specifying the scripts to save my new image to 
> and its still the same error.
> Any ideas ? Is anyone having similar problems ?
> Regards
> Steve Watt

Best Regards
Alexander Aristov

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message