hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jafarim <jafa...@gmail.com>
Subject Re: Running on multiple CPU's
Date Mon, 16 Apr 2007 17:10:43 GMT
Sorry if being off topic, but we experienced a very low bandwidth with
hadoop while copying files to/from the cluster (some 1/100 comparing to
plain samba share). The bandwidth did not improve at all by adding nodes to
the cluster. At that time I thought that hadoop is not supposed to be used
for this purpose and did not use it for my project.
I am just curious how much scalable hadoop is and how bandwidth should grow
as nodes are added to the cluster.

On 4/16/07, Doug Cutting <cutting@apache.org> wrote:
> Eelco Lempsink wrote:
> > Inspired by
> > http://www.mail-archive.com/nutch-user@lucene.apache.org/msg02394.html
> > I'm trying to run Hadoop on multiple CPU's, but without using HDFS.
> To be clear: you need some sort of shared filesystem, if not HDFS, then
> NFS, S3, or something else.  For example, the job client interacts with
> the job tracker by copying files to the shared filesystem named by
> fs.default.name, and job inputs and outputs are assumed to come from a
> shared filesystem.
> So, if you're using NFS, then you'd set fs.default.name to something
> like "file:///mnt/shared/hadoop/".  Note also that as your cluster
> grows, NFS will soon become a bottleneck.  That's why HDFS is provided:
> there aren't other readily available shared filesystems that scale
> appropriately.
> Doug

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message