hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From slitz <slitzferr...@gmail.com>
Subject Re: Namenode Exceptions with S3
Date Fri, 11 Jul 2008 20:09:02 GMT
I've been learning a lot from this thread, and Tom just helped me
understanding some things about S3 and HDFS, thank you.
To wrap everything up, if we want to use S3 with EC2 we can:

a) Use S3 only, without HDFS and configuring fs.default.name as s3://bucket
  -> PROBLEM: we are getting ERROR org.apache.hadoop.dfs.NameNode:
java.lang.RuntimeException: Not a host:port pair: XXXXX
b) Use HDFS as the default FS, specifying S3 only as input for the first Job
and output for the last(assuming one has multiple jobs on same data)
  -> PROBLEM: https://issues.apache.org/jira/browse/HADOOP-3733


So, in my case i cannot use S3 at all for now because of these 2 problems.
Any advice?

slitz

On Fri, Jul 11, 2008 at 4:31 PM, Lincoln Ritter <lincoln@lincolnritter.com>
wrote:

> Thanks Tom!
>
> Your explanation makes things a lot clearer.  I think that changing
> the 'fs.default.name' to something like 'dfs.namenode.address' would
> certainly be less confusing since it would clarify the purpose of
> these values.
>
> -lincoln
>
> --
> lincolnritter.com
>
>
>
> On Fri, Jul 11, 2008 at 4:21 AM, Tom White <tom.e.white@gmail.com> wrote:
> > On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter
> > <lincoln@lincolnritter.com> wrote:
> >> Thank you, Tom.
> >>
> >> Forgive me for being dense, but I don't understand your reply:
> >>
> >
> > Sorry! I'll try to explain it better (see below).
> >
> >>
> >> Do you mean that it is possible to use the Hadoop daemons with S3 but
> >> the default filesystem must be HDFS?
> >
> > The HDFS daemons use the value of "fs.default.name" to set the
> > namenode host and port, so if you set it to a s3 URI, you can't run
> > the HDFS daemons. So in this case you would use the start-mapred.sh
> > script instead of start-all.sh.
> >
> >> If that is the case, can I
> >> specify the output filesystem on a per-job basis and can that be an S3
> >> FS?
> >
> > Yes, that's exactly how you do it.
> >
> >>
> >> Also, is there a particular reason to not allow S3 as the default FS?
> >
> > You can allow S3 as the default FS, it's just that then you can't run
> > HDFS at all in this case. You would only do this if you don't want to
> > use HDFS at all, for example, if you were running a MapReduce job
> > which read from S3 and wrote to S3.
> >
> > It might be less confusing if the HDFS daemons didn't use
> > fs.default.name to define the namenode host and port. Just like
> > mapred.job.tracker defines the host and port for the jobtracker,
> > dfs.namenode.address (or similar) could define the namenode. Would
> > this be a good change to make?
> >
> > Tom
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message