hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee <leely...@gmail.com>
Subject Re: s3
Date Sat, 06 Jan 2007 16:34:37 GMT
S3 as input to hadoop is a very cool idea.  Does anyone know how this might
affect performance in general?

Lee

On 1/2/07, Michael Stack <stack@archive.org> wrote:
>
> I'm trying to use the s3 filesystem that was recently added to hadoop
> TRUNK.
>
> If I set fs.default.name to be s3://AWS_IDENTIFIER:AWS_SECRET@MY_BUCKET/
> so I can run mapreduce jobs that get and set directly from S3, I get the
> following complaint:
>
> java.io.IOException: Cannot create file
> /mapred/system/submit_86vwi0/job.jar since parent directory
> /mapred/system/submit_86vwi0 does not exist.
>
> While '/mapred/system/' exists, the temporary job directory
> 'submit_86vwi0' is not being created.  This looks like a bug.
>
> How are others making use of the S3 filesystem currently?  Are ye
> writing maps/reduces that explicitly get an S3 filesystem for putting
> and getting of S3 inputs/outputs?
>
> What I really want is a mapreduce tool to do bulk copies of HDFS outputs
> to S3 and back again. I made a start on modifying the CopyFiles tool
> (distcp) adding to the mapper factory an S3 mapper to complement the
> already existing HDFS and HTTP implementations but before I go any
> further, perhaps this has been done already?
>
> Thanks for any feedback,
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message