Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Message-ID: <459B432C.6060103@archive.org>
Date: Tue, 02 Jan 2007 21:46:20 -0800
From: Michael Stack <stack@archive.org>
User-Agent: Thunderbird 1.5.0.8 (X11/20061115)
MIME-Version: 1.0
To: hadoop-user@lucene.apache.org
Subject: s3
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

I'm trying to use the s3 filesystem that was recently added to hadoop 
TRUNK. 

If I set fs.default.name to be s3://AWS_IDENTIFIER:AWS_SECRET@MY_BUCKET/ 
so I can run mapreduce jobs that get and set directly from S3, I get the 
following complaint:

java.io.IOException: Cannot create file 
/mapred/system/submit_86vwi0/job.jar since parent directory 
/mapred/system/submit_86vwi0 does not exist.

While '/mapred/system/' exists, the temporary job directory 
'submit_86vwi0' is not being created.  This looks like a bug.

How are others making use of the S3 filesystem currently?  Are ye 
writing maps/reduces that explicitly get an S3 filesystem for putting 
and getting of S3 inputs/outputs?

What I really want is a mapreduce tool to do bulk copies of HDFS outputs 
to S3 and back again. I made a start on modifying the CopyFiles tool 
(distcp) adding to the mapper factory an S3 mapper to complement the 
already existing HDFS and HTTP implementations but before I go any 
further, perhaps this has been done already?

Thanks for any feedback,
St.Ack