Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 53448 invoked from network); 3 Apr 2009 12:37:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Apr 2009 12:37:26 -0000 Received: (qmail 48007 invoked by uid 500); 3 Apr 2009 12:37:24 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 47908 invoked by uid 500); 3 Apr 2009 12:37:23 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 47898 invoked by uid 99); 3 Apr 2009 12:37:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 12:37:23 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [130.209.249.184] (HELO mr1.dcs.gla.ac.uk) (130.209.249.184) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 12:37:14 +0000 Received: from gilstone.dcs.gla.ac.uk ([130.209.241.118]:50801) by mr1.dcs.gla.ac.uk with esmtpa (Exim 4.42) id 1Lpidi-0002ce-9E for core-user@hadoop.apache.org; Fri, 03 Apr 2009 13:36:54 +0100 Message-ID: <49D602E6.2040004@dcs.gla.ac.uk> Date: Fri, 03 Apr 2009 13:36:54 +0100 From: Craig Macdonald User-Agent: Thunderbird 2.0.0.19 (Macintosh/20081209) MIME-Version: 1.0 To: core-user@hadoop.apache.org Subject: best practice: mapred.local vs dfs drives Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello all, Following recent hardware discussions, I thought I'd ask a related question. Our cluster nodes have 3 drives: 1x 160GB system/scratch and 2x 500GB DFS drives. The 160GB system drive is partitioned such that 100GB is for job mapred.local space. However, we find that for our application, mapred.local free space for map output space is the limiting parameter on the number of reducers we can have (our application prefers less reducers). How do people normally work for dfs vs mapred.local space. Do you (a) share the DFS drives with the task tracker temporary files, Or do you (b) keep them on separate partitions or drives? We originally went with (b) because it prevented a run-away job from eating all the DFS space on the machine, however, I'm beginning to realise the disadvantages. Any comments? Thanks Craig