hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: Hardware/Software JBOD vs *.data.dir "JBOD"
Date Mon, 30 Jan 2012 22:10:49 GMT
Three disks each mounted separately. What you say is true, it will
handle failures better and generally perform better. You'll need to
configure the dfs.datanode.failed.volumes.tolerated parameter in
hdfs-site.xml to make sure that it handles a single failed volume
gracefully.

-Joey

On Mon, Jan 30, 2012 at 4:57 PM, Aaron Tokhy
<aaron.tokhy@resonatenetworks.com> wrote:
> Given a HDFS slave node setup of 3 disks per node, should I have 3
> filesystems (one file system per disk) in my dfs.data.dir listing, or should
> I have a single filesystem on a JBOD setup of 3 disks?  Googling this
> problem suggests using "JBOD" instead of RAID 0, but I'm talking about two
> different kinds of JBOD: one managed by OS (mdadm) or firmware with a single
> filesystem, and the other managed by the DataNode (with multiple
> filesystems).
>
> I already have a preference to providing multiple filesystems in the
> dfs.data.dir listing since theoretically the DataNode should properly handle
> where it would place its blocks (instead of abstracting this to the OS or
> firmware).  When a drive dies, I could also theoretically swap in a new
> drive without worrying about crashing an entire JBOD array (technically I
> only lose the blocks on the failing disk, not risking filesystem level
> corruption).  In some ways, I may already know the answer to my question,
> I'm just looking for anyone's experience with this datacenter-wide decision,
> or if they have a preference of one method over another.
>
>
> I'm trying to go along the lines as what is being done in this post:
>
> http://old.nabble.com/forum/ViewPost.jtp?post=21423861&framed=y



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message