hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Zaliva <kroko...@gmail.com>
Subject Re: tuning performance
Date Sat, 14 Mar 2009 08:53:54 GMT

Thanks for interesting information. By JBOD, I assume you mean just listing
multiple partition mount points in hadoop config?


On Fri, Mar 13, 2009 at 12:48, Scott Carey <scott@richrelevance.com> wrote:
> On 3/13/09 11:56 AM, "Allen Wittenauer" <aw@yahoo-inc.com> wrote:
> On 3/13/09 11:25 AM, "Vadim Zaliva" <krokodil@gmail.com> wrote:
>>>    When you stripe you automatically make every disk in the system have the
>>> same speed as the slowest disk.  In our experiences, systems are more likely
>>> to have a 'slow' disk than a dead one.... and detecting that is really
>>> really hard.  In a distributed system, that multiplier effect can have
>>> significant consequences on the whole grids performance.
>> All disk are the same, so there is no speed difference.
>    There will be when they start to fail. :)
> This has been discussed before:
> http://www.nabble.com/RAID-vs.-JBOD-td21404366.html
> JBOD is going to be better, the only benefit of RAID-0 is slightly easier management
in hadoop config, but harder to manage at the OS level.
> When a single JBOD drive dies, you only lose that set of data.  The datanode goes down
but a restart brings back up the parts that still exist.  Then you can leave it be while
the replacement is procured... With RAID-0 the whole node is down until you get the new drive
and recreate the RAID.
> With JBOD, don't forget to set the linux readahead for the drives to a decent level  (you'll
gain up to 25% more sequential read throughput depending on your kernel version).  (blockdev
-setra 8192 /dev/<device>).  I also see good gains by using xfs instead of ext3.  For
a big shocker check out the difference in time to delete a bunch of large files with ext3
(long time) versus xfs (almost instant).
> For the newer drives, they can do about 120MB/sec at the front of the drive when tuned
(xfs, readahead >4096) and the back of the drive is 60MB/sec.  If you are going to not
use 100% of the drive for HDFS, use this knowledge and place the partitions appropriately.
 The last 20% or so of the drive is a lot slower than the front 60%.  Here is a typical
sequential transfer rate chart for a SATA drive as a function of LBA:
> http://www.tomshardware.com/reviews/Seagate-Barracuda-1.5-TB,2032-5.html
> (graphs aare about 3/4 of the way down the page before the comments).

View raw message