Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 2640 invoked from network); 25 Jul 2006 18:01:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 25 Jul 2006 18:01:26 -0000 Received: (qmail 45973 invoked by uid 500); 25 Jul 2006 18:01:25 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 45955 invoked by uid 500); 25 Jul 2006 18:01:24 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 45945 invoked by uid 99); 25 Jul 2006 18:01:24 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jul 2006 11:01:24 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of sutter@gmail.com designates 64.233.162.196 as permitted sender) Received: from [64.233.162.196] (HELO nz-out-0102.google.com) (64.233.162.196) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jul 2006 11:01:24 -0700 Received: by nz-out-0102.google.com with SMTP id s1so505980nze for ; Tue, 25 Jul 2006 11:01:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=NEICEL9aKQHFUB9n3vadk1o5gYw0R677tG3s+SgPufVGWp2RhMo4lPryhyHfw+5uRBsAzYaQWJVtScAJd1PGDBTEoBrN3Wlc5VmM55YKNqcPYq/yhpyWRAN8Q9s5OFcQEOfqwfzippaE4X40aPY7LaWOfRmHiaIMlvaJrWPJCUs= Received: by 10.65.81.19 with SMTP id i19mr5686287qbl; Tue, 25 Jul 2006 11:01:00 -0700 (PDT) Received: by 10.36.132.4 with HTTP; Tue, 25 Jul 2006 11:01:00 -0700 (PDT) Message-ID: Date: Tue, 25 Jul 2006 11:01:00 -0700 From: "Paul Sutter" To: hadoop-user@lucene.apache.org Subject: Re: Task type priorities during scheduling ? In-Reply-To: <44C5D507.9080203@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <358D735BD7AB45429F2B1C14F38E10F70465C729@DEN-EXM-03.corp.ebay.com> <007501c6ac3c$8dc7bad0$a248480a@ds.corp.yahoo.com> <80CCE470-BF9D-4AC6-9B76-F55EE0E3E31B@yahoo-inc.com> <44C4801F.5010903@apache.org> <44C5D507.9080203@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N First, It matters in the case of concurrent jobs. If you submit a 20 minute job while a 20 hour job is running, it would be nice if the reducers for the 20 minute job could get a chance to run before the 20 hour job's mappers have all finished. So even without a throughput improvement, you have an important capability (although it may require another minor tweak or two to make possible). Secondarily, we often have stragglers, where one mapper runs slower than the others. When this happens, we end up with a largely idle cluster for as long as an hour. In cases like these, good support for concurrent jobs _would_ improve throughput. Paul On 7/25/06, Doug Cutting wrote: > Paul Sutter wrote: > > it should be possible to have lots of tasks in the shuffle phase > > (mostly, sitting around waiting for mappers to run), but only have > > "about" one actual reduce phase running per cpu (or whatever works for > > each of our apps) that gets enough memory for a sorter, does > > substantial computation, etc. > > Ah, now I see your point, although I don't see how this would improve > overall throughput. In most cases, the optimal configuration is for the > total number of reduce tasks to be roughly the total number of reduces > that can run at once. So there is no queue of waiting reduce tasks to > schedule. > > Doug > >