hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohan Rai <rohan....@inmobi.com>
Subject HDFS Read ThroughPut and DISK Read ThroughPut
Date Fri, 14 May 2010 08:23:10 GMT

Is there a relationship between HDFS Read throught put and Disk Read

If yes what would be that.

Lets say we have a disk giving us 120 MB/s

And a Cluster of 6 Nodes

Each Node having 6 disk.

So in an absolutely ideal world it should give us a through put
of 120*6*6 MB/s if used in parallel
In a non ideal world we can divide above by a factor of x

Then why is that the general CLUSTER read throughput is so very less.

Generally it hovers around 90MB/s.

How is the throughput which cluster provides is accounted for.

Just for information, configs are , 8 GB RAM, 250 GB HDD, 8 Maps per
node, 128 Kb Block size


The information contained in this communication is intended solely for the use of the individual
or entity to whom it is addressed and others authorized to receive it. It may contain confidential
or legally privileged information. If you are not the intended recipient you are hereby notified
that any disclosure, copying, distribution or taking any action in reliance on the contents
of this information is strictly prohibited and may be unlawful. If you have received this
communication in error, please notify us immediately by responding to this email and then
delete it from your system. The firm is neither liable for the proper and complete transmission
of the information contained in this communication nor for any delay in its receipt.

View raw message