Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 90189 invoked from network); 16 Dec 2007 21:55:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Dec 2007 21:55:47 -0000 Received: (qmail 42017 invoked by uid 500); 16 Dec 2007 21:55:29 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 41986 invoked by uid 500); 16 Dec 2007 21:55:29 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 41977 invoked by uid 99); 16 Dec 2007 21:55:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Dec 2007 13:55:29 -0800 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Dec 2007 21:55:07 +0000 Received: from 75.80.179.210 ([75.80.179.210]) by ex9.hostedexchange.local ([69.50.2.13]) with Microsoft Exchange Server HTTP-DAV ; Sun, 16 Dec 2007 21:55:05 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Sun, 16 Dec 2007 13:55:02 -0800 Subject: Re: How can the reducer be invoked lazily? From: Ted Dunning To: Message-ID: Thread-Topic: How can the reducer be invoked lazily? Thread-Index: Acg+vwL+pXzXZ7uJRj+jGvFkgdllpQBGnbbgABU1Uf0= In-Reply-To: <001201c83fdf$90564950$2401a8c0@ds.corp.yahoo.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Devaraj is correct that there is no mechanism to create reduce tasks only as necessary, but remember that each reducer does many reductions. This means that empty ranges rarely have a large, unbalanced effect. If this is still a problem you can do two things, - first, you can use the hash of the real key (put the real key in the value). That will cause empty ranges to be spread all over hither and yon, giving you the balance you seek (this behavior may actually be the default). - secondly, you can use lots of reducers. If the number of reducers is large, then the lost resources due to empty ranges will be small since each reducer is doing very little work. If the number of reducers exceeds the number of available tasks, then you get even better balancing because machines that do empty ranges (quickly) will ask more more work. - conversely, you can use just a few reducers. This way the empty ranges will only be a small part of any given reducers workload. Do you have evidence that this is a real problem? On 12/16/07 4:31 AM, "Devaraj Das" wrote: > This is not possible. The framework always creates reduce tasks from 0 - > num_reduces. > >> -----Original Message----- >> From: Rui Shi [mailto:shearershot@yahoo.com] >> Sent: Saturday, December 15, 2007 7:34 AM >> To: hadoop-user@lucene.apache.org >> Subject: How can the reducer be invoked lazily? >> >> Hi, >> >> How can we specify so that the reducers can be invoked >> lazily? For instance, I know there are no partitions in the >> range of 200-300. How can I let the hadoop know that no need >> to invoke reduce tasks for those partitions? >> >> Thanks, >> >> Rui >> >> >> >> >> ______________________________________________________________ >> ______________________ >> Be a better friend, newshound, and >> know-it-all with Yahoo! Mobile. Try it now. >> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ >> >