hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Sapovits <ssapov...@invitemedia.com>
Subject Re: Amazon S3 questions
Date Sat, 01 Mar 2008 21:06:45 GMT
Tom White wrote:

> From the client's point of view fs.default.name sets the default
> filesystem, and is used to resolve paths that don't specify a
> protocol. You can always use a fully qualified URI to specify the path
> e.g. s3://bucket/a/b or hdfs://nn/a/b. This allows you to e.g. e.g.
> take map inputs from HDFS and write reduce outputs to S3.
> For HDFS the setting of fs.default.name in hadoop-site.xml determines
> the host and port for the namenode.
> Does this help? How are you trying to use S3 by the way?

Yup - I got that far.  It looks like with S3 there is no real name node or data
node cluster -- that S3 distribution is used instead (sort of directly).  That's
where my question was.  I like that if that's the case.  Does that make sense?

We will probably be running at least one version of a log writer/map-reducer
on EC2/S3.  Basically, large volumes of data related to a specific type of 
problem that we map-reduce for analysis.  We've been playing with Pig on
top of map-reduce as well.  Good stuff.

The only gotcha I see:  We wrote (extended really) a SWIG wrapper on top
of the C libhdfs library so we could interface to Python.  It looks like the libhdfs
connect logic isn't using the URI schemes 100% correctly -- I doubt S3 will
work through there.  But that looks like an easy fix if that's the case (I think).
Testing that next ...

Steve Sapovits
Invite Media  -  http://www.invitemedia.com

View raw message