hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White" <tom.e.wh...@gmail.com>
Subject Re: Namenode Exceptions with S3
Date Fri, 11 Jul 2008 11:21:10 GMT
On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter
<lincoln@lincolnritter.com> wrote:
> Thank you, Tom.
>
> Forgive me for being dense, but I don't understand your reply:
>

Sorry! I'll try to explain it better (see below).

>
> Do you mean that it is possible to use the Hadoop daemons with S3 but
> the default filesystem must be HDFS?

The HDFS daemons use the value of "fs.default.name" to set the
namenode host and port, so if you set it to a s3 URI, you can't run
the HDFS daemons. So in this case you would use the start-mapred.sh
script instead of start-all.sh.

> If that is the case, can I
> specify the output filesystem on a per-job basis and can that be an S3
> FS?

Yes, that's exactly how you do it.

>
> Also, is there a particular reason to not allow S3 as the default FS?

You can allow S3 as the default FS, it's just that then you can't run
HDFS at all in this case. You would only do this if you don't want to
use HDFS at all, for example, if you were running a MapReduce job
which read from S3 and wrote to S3.

It might be less confusing if the HDFS daemons didn't use
fs.default.name to define the namenode host and port. Just like
mapred.job.tracker defines the host and port for the jobtracker,
dfs.namenode.address (or similar) could define the namenode. Would
this be a good change to make?

Tom

Mime
View raw message