hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Macdonald <cra...@dcs.gla.ac.uk>
Subject Re: best practice: mapred.local vs dfs drives
Date Mon, 06 Apr 2009 11:51:44 GMT
Thanks for the headsup.

C

Owen O'Malley wrote:
> We always share the drives.
>
> -- Owen
>
> On Apr 5, 2009, at 0:52, zsongbo <zsongbo@gmail.com> wrote:
>
>> I usually set mapred.local.dir to share the disk space with DFS, 
>> since some
>> mapreduce job need big temp space.
>>
>>
>>
>> On Fri, Apr 3, 2009 at 8:36 PM, Craig Macdonald 
>> <craigm@dcs.gla.ac.uk>wrote:
>>
>>> Hello all,
>>>
>>> Following recent hardware discussions, I thought I'd ask a related
>>> question. Our cluster nodes have 3 drives: 1x 160GB system/scratch 
>>> and 2x
>>> 500GB DFS drives.
>>>
>>> The 160GB system drive is partitioned such that 100GB is for job
>>> mapred.local space. However, we find that for our application, 
>>> mapred.local
>>> free space for map output space is the limiting parameter on the 
>>> number of
>>> reducers we can have (our application prefers less reducers).
>>>
>>> How do people normally work for dfs vs mapred.local space. Do you 
>>> (a) share
>>> the DFS drives with the task tracker temporary files, Or do you (b) 
>>> keep
>>> them on separate partitions or drives?
>>>
>>> We originally went with (b) because it prevented a run-away job from 
>>> eating
>>> all the DFS space on the machine, however, I'm beginning to realise the
>>> disadvantages.
>>>
>>> Any comments?
>>>
>>> Thanks
>>>
>>> Craig
>>>
>>>


Mime
View raw message