hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: Using S3 Block FileSystem as HDFS replacement
Date Tue, 01 Jul 2008 15:43:57 GMT
by editing the hadoop-site.xml, you set the default. but I don't  
recommend changing the default on EC2.

but you can specify the filesystem to use through the URL that  
references your data (jobConf.addInputPath etc) for a particular job.  
in the case of the S3 block filesystem, just use a s3:// url.


On Jun 30, 2008, at 8:04 PM, slitz wrote:

> Hello,
> I've been trying to setup hadoop to use s3 as filesystem, i read in  
> the wiki
> that it's possible to choose either S3 native FileSystem or S3 Block
> Filesystem. I would like to use S3 Block FileSystem to avoid the  
> task of
> "manually" transferring data from S3 to HDFS every time i want to  
> run a job.
> I'm still experimenting with EC2 contrib scripts and those seem to be
> excellent.
> What i can't understand is how may be possible to use S3 using a  
> public
> hadoop AMI since from my understanding hadoop-site.xml gets written  
> on each
> instance startup with the options on hadoop-init, and it seems that  
> the
> public AMI (at least the 0.17.0 one) is not configured to use S3 at
> all(which makes sense because the bucket would need individual  
> configuration
> anyway).
> So... to use S3 block FileSystem with EC2 i need to create a custom  
> AMI with
> a modified hadoop-init script right? or am I completely confused?
> slitz

Chris K Wensel

View raw message