hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: s3
Date Wed, 03 Jan 2007 17:47:58 GMT
Michael Stack wrote:
> I'm trying to use the s3 filesystem that was recently added to hadoop 
> TRUNK.
> If I set fs.default.name to be s3://AWS_IDENTIFIER:AWS_SECRET@MY_BUCKET/ 
> so I can run mapreduce jobs that get and set directly from S3, I get the 
> following complaint:
> 
> java.io.IOException: Cannot create file 
> /mapred/system/submit_86vwi0/job.jar since parent directory 
> /mapred/system/submit_86vwi0 does not exist.
> 
> While '/mapred/system/' exists, the temporary job directory 
> 'submit_86vwi0' is not being created.  This looks like a bug.

Yes, it does.  Please submit an issue in Jira.

> How are others making use of the S3 filesystem currently?  Are ye 
> writing maps/reduces that explicitly get an S3 filesystem for putting 
> and getting of S3 inputs/outputs?

I doubt many folks are yet using it for mapreduce, or else they'd have 
encountered this bug.

> What I really want is a mapreduce tool to do bulk copies of HDFS outputs 
> to S3 and back again. I made a start on modifying the CopyFiles tool 
> (distcp) adding to the mapper factory an S3 mapper to complement the 
> already existing HDFS and HTTP implementations but before I go any 
> further, perhaps this has been done already?

I don't think this has been done yet.  Note that CopyFiles should be 
simpler to implement now, since URIs are supported directly by Hadoop now.

Cheers,

Doug

Mime
View raw message