hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: RAID-0 vs. JBOD?
Date Thu, 10 Apr 2008 18:01:40 GMT

I haven't done a detailed comparison, but I have seen some effects:

A) raid doesn't usually work really well on low-end machines compared to
independent drives.  This would make me distrust raid.

B) hadoop doesn't do very well, historically speaking with more than one
partition if the partitions are not roughly equal in size.  Quite frankly,
it doesn't even do all that well with datanodes that have radically
different storage availability.

C) with raid-0, if you lose either drive, you lose both.  With separate
partitions, you can lose one drive and retain the other.

These lead to opposite conclusions, so I don't know what to recommend.  If I
had to choose, I think I would do without RAID.

On 4/10/08 10:29 AM, "Colin Evans" <colin@metaweb.com> wrote:

> We're building a cluster of 40 machines with 5 drives each, and I'm
> curious what people's experiences have been for using RAID-0 for HDFS
> vs. configuring seperate partitions (JBOD) and having the datanode
> balance between them.
> I took a look at the datanode code, and datanodes appear to write blocks
> using a round-robin algorithm when managing multiple partitions.  In
> theory, the striping on RAID-0 should be more evenly balanced than this,
> but RAID-0 doesn't seem to give a speedup proportionate to the number of
> drives being striped.  Furthermore, our initial tests seem to suggest
> that the JBOD configuration spends less time in wait state than the
> RAID-0 configuration when running disk-bound jobs.
> We're still tweaking our own benchmarks, so we don't have any conclusive
> results yet.  Has anyone done this kind of comparison before?

View raw message