hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: tuning performance
Date Fri, 13 Mar 2009 19:48:31 GMT
On 3/13/09 11:56 AM, "Allen Wittenauer" <aw@yahoo-inc.com> wrote:

On 3/13/09 11:25 AM, "Vadim Zaliva" <krokodil@gmail.com> wrote:

>>    When you stripe you automatically make every disk in the system have the
>> same speed as the slowest disk.  In our experiences, systems are more likely
>> to have a 'slow' disk than a dead one.... and detecting that is really
>> really hard.  In a distributed system, that multiplier effect can have
>> significant consequences on the whole grids performance.
> All disk are the same, so there is no speed difference.

    There will be when they start to fail. :)

This has been discussed before:

JBOD is going to be better, the only benefit of RAID-0 is slightly easier management in hadoop
config, but harder to manage at the OS level.
When a single JBOD drive dies, you only lose that set of data.  The datanode goes down but
a restart brings back up the parts that still exist.  Then you can leave it be while the replacement
is procured... With RAID-0 the whole node is down until you get the new drive and recreate
the RAID.

With JBOD, don't forget to set the linux readahead for the drives to a decent level  (you'll
gain up to 25% more sequential read throughput depending on your kernel version).  (blockdev
-setra 8192 /dev/<device>).  I also see good gains by using xfs instead of ext3.  For
a big shocker check out the difference in time to delete a bunch of large files with ext3
(long time) versus xfs (almost instant).

For the newer drives, they can do about 120MB/sec at the front of the drive when tuned (xfs,
readahead >4096) and the back of the drive is 60MB/sec.  If you are going to not use 100%
of the drive for HDFS, use this knowledge and place the partitions appropriately.  The last
20% or so of the drive is a lot slower than the front 60%.  Here is a typical sequential transfer
rate chart for a SATA drive as a function of LBA:
(graphs aare about 3/4 of the way down the page before the comments).

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message