Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 52556 invoked from network); 9 Nov 2007 22:46:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Nov 2007 22:46:46 -0000 Received: (qmail 22896 invoked by uid 500); 9 Nov 2007 22:46:31 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 22869 invoked by uid 500); 9 Nov 2007 22:46:31 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 22860 invoked by uid 99); 9 Nov 2007 22:46:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2007 14:46:31 -0800 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.208.5.56] (HELO skyo.cs.washington.edu) (128.208.5.56) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2007 22:47:13 +0000 X-Received: from [128.208.3.184] (fantomes.cs.washington.edu [128.208.3.184]) (authenticated bits=0) by skyo.cs.washington.edu (8.14.1/8.14.1/1.9) with ESMTP id lA9MkCeB000819 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Fri, 9 Nov 2007 14:46:12 -0800 (envelope-from ak@cs.washington.edu) Message-ID: <4734E379.4090203@cs.washington.edu> Date: Fri, 09 Nov 2007 14:47:21 -0800 From: Aaron Kimball User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: hadoop-user@lucene.apache.org Subject: Re: Tech Talk: Dryad References: <52806.192.168.1.58.1194626963.webmail@192.168.1.58> <3EE32DE5-2C9D-4405-9E0B-46B072DA4E5C@yahoo-inc.com> In-Reply-To: <3EE32DE5-2C9D-4405-9E0B-46B072DA4E5C@yahoo-inc.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Is there a timeframe for when rack-locality will be available? - Aaron Owen O'Malley wrote: > > On Nov 9, 2007, at 8:49 AM, Stu Hood wrote: > >> Currently there is no sanctioned method of 'piping' the reduce output >> of one job directly into the map input of another (although it has >> been discussed: see the thread I linked before: >> http://www.nabble.com/Poly-reduce--tf4313116.html ). > > Did you read the conclusion of the previous thread? The performance > gains in avoiding the second map input are trivial compared the gains > in simplicity of having a single data path and re-execution story. > During a reasonably large job, roughly 98% of your maps are reading > data on the _same_ node. Once we put in rack locality, it will be even > better. > > I'd much much rather build the map/reduce primitive and support it > very well than add the additional complexity of any sort of > poly-reduce. I think it is very appropriate for systems like Pig to > include that kind of optimization, but it should not be part of the > base framework. > > I watched the front of the Dryad talk and was struck by how complex it > quickly became. It does give the application writer a lot of control, > but to do the equivalent of a map/reduce sort with 100k maps and 4k > reduces with automatic spill-over to disk during the shuffle seemed > _really_ complicated. > > On a side note, in the part of the talk that I watched, the scaling > graph went from 2 to 9 nodes. Hadoop's scaling graphs go to 1000's of > nodes. Did they ever suggest later in the talk that it scales up higher? > > -- Owen