hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Bieniosek <mich...@powerset.com>
Subject Re: bandwidth (Was: Re: Running on multiple CPU's)
Date Mon, 16 Apr 2007 17:44:29 GMT
What are you trying to do?  Hadoop dfs has different goals than a network
file system such as samba.


On 4/16/07 10:32 AM, "jafarim" <jafarim@gmail.com> wrote:

> On linux and jvm6 with normal IDE disks and a giga ethernet switch with
> corresponding NIC and with hadoop 0.9.11's HDFS. We wrote a C program by
> using the native libs provided in the package but then we tested again with
> distcp. The scenario was as follows:
> We ran the test on a cluster with 1 node, then we added the nodes one by one
> until reaching 5 nodes. Same test with samba saturated the link with only
> one node.
> --jaf
> On 4/16/07, Doug Cutting <cutting@apache.org> wrote:
>> Please use a new subject when starting a new topic.
>> jafarim wrote:
>>> Sorry if being off topic, but we experienced a very low bandwidth with
>>> hadoop while copying files to/from the cluster (some 1/100 comparing to
>>> plain samba share). The bandwidth did not improve at all by adding nodes
>> to
>>> the cluster. At that time I thought that hadoop is not supposed to be
>> used
>>> for this purpose and did not use it for my project.
>>> I am just curious how much scalable hadoop is and how bandwidth should
>> grow
>>> as nodes are added to the cluster.
>> It's not clear to me what you tried.  Are you running HDFS?  On how
>> large of a cluster?  What version of Hadoop?  What operating system?
>> How were you copying files to/from the cluster?
>> The 'bin/hadoop distcp' command should scale to consume available
>> network bandwidth and disk i/o.
>> Doug

View raw message