Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of swatt@us.ibm.com designates
 32.97.110.151 as permitted sender)
To: core-user@hadoop.apache.org
MIME-Version: 1.0
Subject: Getting Hadoop Working on EC2/S3
Message-ID: 
 <OF6E3E248C.A6371A83-ON862574D3.006957DC-862574D3.006A317D@us.ibm.com>
From: Stephen Watt <swatt@us.ibm.com>
Date: Mon, 29 Sep 2008 14:19:51 -0500
Content-Type: multipart/alternative;
 boundary="=_alternative 006A317C862574D3_="

--=_alternative 006A317C862574D3_=
Content-Type: text/plain; charset="US-ASCII"

Hi Folks

Before I get started, I just want to state that I've done the due 
diligence and read Tom White's articles as well as EC2 and S3 pages on the 
Hadoop Wiki and done some searching on this.

Thus far I have successfully got Hadoop running on EC2 with no problems. 
In my local hadoop 0.18 environment I simply add my AWS keys to the 
hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch 
cluster script and it works great. 

Now, I'm trying to use the Public Haodop EC2 images to run over S3 instead 
of HDFS. They are set up to use variables passed in from a parameterized 
launch for all the config options everything EXCEPT the 
fs.default.filesystem. So in order to bring a cluster of 20 hadoop 
instances up that run over S3, I need to mod the config file to point to 
my S3 bucket for the fs.default.filesystem and keep the rest the same. 
Thus I need my own image to do this.  I am attempting this by using the 
local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried this 
both on a windows system (cygwin environment) AND on my ubuntu 8 system 
and with each one it gets all the way to the end and fails as it attempts 
to save the new image to my bucket and says the bucket does not exist with 
a Server.NoSuchBucket (404) error. 

The S3 bucket definitely does exist. I have block data inside of it that 
are results of my Hadoop Jobs. I can go to a single hadoop image on EC2 
that I've launched and manually set up to use S3 and say bin/hadoop dfs 
-ls / and I can see the contents of my S3 bucket. I can also succesfully 
use that s3 bucket as an input and output of my jobs for a single EC2 
hadoop instance. I've tried creating new buckets using the FireFox S3 
Organizer plugin and specifying the scripts to save my new image to those 
and its still the same error. 

Any ideas ? Is anyone having similar problems ?

Regards
Steve Watt
--=_alternative 006A317C862574D3_=--