hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HDFS issues..!
Date Wed, 10 Jun 2009 14:31:07 GMT
On Wed, Jun 10, 2009 at 4:55 AM, Sugandha Naolekar

>         If I want to make the data transfer fast, then what am I supposed
> to do? I want to place the data in HDFS and replicate it in fraction of
> seconds.

I want to go to France, but it takes 10+ hours to get there from California
on the fastest plane. How can I get there faster?

> Can that be possible. and How? Placing a 5GB file will take atleast
> half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and
> then placing it in HDFS would take around 2-3 hours. So, how that time
> delay
> can be avoided..?

HDFS will only replicate as many times as you want it to. The write is also
pipelined. This means that writing a 5G file that is replicated to 3 nodes
is only marginally faster than the same file on 10 nodes, if for some reason
you wanted to set your replication count to 10 (unnecessary for 99.99999% of
use cases)

>         Also, My simply aim is to transfer the data, i.e; dumping the data
> into HDFS and gettign it back whenever needed. So, for this, transfer, how
> speed can be achieved?

HDFS isn't magic. You can only write as fast as your disk and network can.
If your disk has 50MB/sec of throughput, you'll probably be limited at
50MB/sec. Expecting much more than this in real life scenarios is


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message