hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@archive.org>
Subject s3
Date Wed, 03 Jan 2007 05:46:20 GMT
I'm trying to use the s3 filesystem that was recently added to hadoop 

If I set fs.default.name to be s3://AWS_IDENTIFIER:AWS_SECRET@MY_BUCKET/ 
so I can run mapreduce jobs that get and set directly from S3, I get the 
following complaint:

java.io.IOException: Cannot create file 
/mapred/system/submit_86vwi0/job.jar since parent directory 
/mapred/system/submit_86vwi0 does not exist.

While '/mapred/system/' exists, the temporary job directory 
'submit_86vwi0' is not being created.  This looks like a bug.

How are others making use of the S3 filesystem currently?  Are ye 
writing maps/reduces that explicitly get an S3 filesystem for putting 
and getting of S3 inputs/outputs?

What I really want is a mapreduce tool to do bulk copies of HDFS outputs 
to S3 and back again. I made a start on modifying the CopyFiles tool 
(distcp) adding to the mapper factory an S3 mapper to complement the 
already existing HDFS and HTTP implementations but before I go any 
further, perhaps this has been done already?

Thanks for any feedback,

View raw message