hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Pushkarev" <u...@stanford.edu>
Subject RE: Disk configuration.
Date Mon, 13 Jul 2009 19:51:57 GMT
Thanks!  I suspected that there is a RR scheme, but didn't find it anywhere
in documentation. 

-----Original Message-----
From: Scott Carey [mailto:scott@richrelevance.com] 
Sent: Monday, July 13, 2009 12:39 PM
To: common-user@hadoop.apache.org; core-user@hadoop.apache.org
Subject: Re: Disk configuration.

For both the DN and TT you can provide a comma separated list of
directories.

So, drive 1 could be /hadoop1
And drive 2 /hadoop2

Then in each of those there could be a dfs directory and another for task
temp storage.

Hadoop will round-robin writes to these automatically.

Dfs.data.dir might look something like:
<property>
  <name>dfs.data.dir</name>
  <value>/hadoop1/dfs/data,/hadoop2/dfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  </description>
</property>

And the local mapreduce dir might look like:
<property>
  <name>mapred.local.dir</name>
  <value>/hadoop1/tmp,/hadoop2/tmp</value>
  <description>The local directory where MapReduce stores intermediate
  data files.  May be a comma-separated list of
  directories on different devices in order to spread disk i/o.
  Directories that do not exist are ignored.
  </description>
</property>


On 7/13/09 11:50 AM, "Dmitry Pushkarev" <umka@stanford.edu> wrote:

Hi.



We're running a small 30 node cluster  and in a few days will reinstall the
whole software, thus I want to change HDD configuration that was done long
time ago and seems to be inefficient - each node has 2x1TB drives that are
LVMed to one single volume.



How do people usually setup drives? For example will it be better to mount
them to two separate folders and feed these folder to both tasktracker and
datanode? Or setup LVM with raid 0 to maximize bandwidth.



What I want is that 2TB of drive space per node were equally accessible to
both tasktracker and datanode, and I'm not sure that mounting two drives to
separate folders achieves that.  (for example if reducer fills one drive
will it start writing the rest of the data to second drive? )



Thanks.




Mime
View raw message