Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 90453 invoked from network); 30 Nov 2006 17:48:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Nov 2006 17:48:47 -0000 Received: (qmail 19483 invoked by uid 500); 30 Nov 2006 17:48:56 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 19458 invoked by uid 500); 30 Nov 2006 17:48:56 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 19449 invoked by uid 99); 30 Nov 2006 17:48:55 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Nov 2006 09:48:55 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Nov 2006 09:48:46 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DD48C7142F7 for ; Thu, 30 Nov 2006 09:48:22 -0800 (PST) Message-ID: <5697052.1164908902903.JavaMail.jira@brutus> Date: Thu, 30 Nov 2006 09:48:22 -0800 (PST) From: "Arun C Murthy (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-657) Free temporary space should be modelled better In-Reply-To: <11001957.1162254376612.JavaMail.root@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/HADOOP-657?page=comments#action_12454714 ] Arun C Murthy commented on HADOOP-657: -------------------------------------- It also necessiates a 'Writable' MetricsRecordImpl (for RPC) and some apis for 'reading' the metrics i.e. getMetric/getTag apis which the JobTracker can use to retrieve information. > Free temporary space should be modelled better > ---------------------------------------------- > > Key: HADOOP-657 > URL: http://issues.apache.org/jira/browse/HADOOP-657 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.7.2 > Reporter: Owen O'Malley > Assigned To: Arun C Murthy > > Currently, there is a configurable size that must be free for a task tracker to accept a new task. However, that isn't a very good model of what the task is likely to take. I'd like to propose: > Map tasks: totalInputSize * conf.getFloat("map.output.growth.factor", 1.0) / numMaps > Reduce tasks: totalInputSize * 2 * conf.getFloat("map.output.growth.factor", 1.0) / numReduces > where totalInputSize is the size of all the maps inputs for the given job. > To start a new task, > newTaskAllocation + (sum over running tasks of (1.0 - done) * allocation) >= > free disk * conf.getFloat("mapred.max.scratch.allocation", 0.90); > So in English, we will model the expected sizes of tasks and only task tasks that should leave us a 10% margin. With: > map.output.growth.factor -- the relative size of the transient data relative to the map inputs > mapred.max.scratch.allocation -- the maximum amount of our disk we want to allocate to tasks. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira