Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Message-ID: <49D602E6.2040004@dcs.gla.ac.uk>
Date: Fri, 03 Apr 2009 13:36:54 +0100
From: Craig Macdonald <craigm@dcs.gla.ac.uk>
User-Agent: Thunderbird 2.0.0.19 (Macintosh/20081209)
MIME-Version: 1.0
To: core-user@hadoop.apache.org
Subject: best practice: mapred.local vs dfs drives
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hello all,

Following recent hardware discussions, I thought I'd ask a related 
question. Our cluster nodes have 3 drives: 1x 160GB system/scratch and 
2x 500GB DFS drives.

The 160GB system drive is partitioned such that 100GB is for job 
mapred.local space. However, we find that for our application, 
mapred.local free space for map output space is the limiting parameter 
on the number of reducers we can have (our application prefers less 
reducers).

How do people normally work for dfs vs mapred.local space. Do you (a) 
share the DFS drives with the task tracker temporary files, Or do you 
(b) keep them on separate partitions or drives?

We originally went with (b) because it prevented a run-away job from 
eating all the DFS space on the machine, however, I'm beginning to 
realise the disadvantages.

Any comments?

Thanks

Craig