hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Tokhy <aaron.to...@resonatenetworks.com>
Subject Hardware/Software JBOD vs *.data.dir "JBOD"
Date Mon, 30 Jan 2012 21:57:42 GMT
Given a HDFS slave node setup of 3 disks per node, should I have 3 
filesystems (one file system per disk) in my dfs.data.dir listing, or 
should I have a single filesystem on a JBOD setup of 3 disks?  Googling 
this problem suggests using "JBOD" instead of RAID 0, but I'm talking 
about two different kinds of JBOD: one managed by OS (mdadm) or firmware 
with a single filesystem, and the other managed by the DataNode (with 
multiple filesystems).

I already have a preference to providing multiple filesystems in the 
dfs.data.dir listing since theoretically the DataNode should properly 
handle where it would place its blocks (instead of abstracting this to 
the OS or firmware).  When a drive dies, I could also theoretically swap 
in a new drive without worrying about crashing an entire JBOD array 
(technically I only lose the blocks on the failing disk, not risking 
filesystem level corruption).  In some ways, I may already know the 
answer to my question, I'm just looking for anyone's experience with 
this datacenter-wide decision, or if they have a preference of one 
method over another.

I'm trying to go along the lines as what is being done in this post:


View raw message