Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 80268 invoked from network); 2 Oct 2007 03:26:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Oct 2007 03:26:09 -0000 Received: (qmail 76469 invoked by uid 500); 2 Oct 2007 03:25:57 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 76445 invoked by uid 500); 2 Oct 2007 03:25:57 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 76436 invoked by uid 99); 2 Oct 2007 03:25:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2007 20:25:57 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,HTML_NONELEMENT_40_50,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stuhood@webmail.us designates 207.97.245.121 as permitted sender) Received: from [207.97.245.121] (HELO smtp121.iad.emailsrvr.com) (207.97.245.121) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2007 03:25:59 +0000 Received: from webmail.us (webmail21.webmail.iad.mlsrvr.com [192.168.1.18]) by relay2.r2.iad.emailsrvr.com (SMTP Server) with ESMTP id BA92344C8F9 for ; Mon, 1 Oct 2007 23:24:37 -0400 (EDT) Received: by beta.webmail.us (Authenticated sender: stuhood@webmail.us, from: stuhood@webmail.us) with HTTP; Mon, 1 Oct 2007 23:24:37 -0400 (EDT) Date: Mon, 1 Oct 2007 23:24:37 -0400 (EDT) Subject: =?UTF-8?Q?Re:=20computing=20conditional=20probabilities=20with=20Hadoop?= =?UTF-8?Q?=3F?= From: "Stu Hood" To: hadoop-user@lucene.apache.org Reply-To: stuhood@webmail.us MIME-Version: 1.0 Content-Type: multipart/alternative;boundary="----=_20071001232437_74792" Importance: Normal X-Priority: 3 (Normal) X-Type: 2 Message-ID: <48320.192.168.1.70.1191295477.webmail@192.168.1.70> X-Mailer: webmail6.5b X-Virus-Checked: Checked by ClamAV on apache.org ------=_20071001232437_74792 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Have you done any testing to confirm that the order of the output keys is a= ctually changed?=0A=0AMerge-sort on its own is a 'stable' algorithm, and so= the order should not change unless different variations on sorting are use= d (in memory before spilling to disk, for instance).=0A=0AThanks,=0AStu=0A= =0A=0A-----Original Message-----=0AFrom: Ted Dunning =0A= Sent: Monday, October 1, 2007 10:32pm=0ATo: hadoop-user@lucene.apache.org= =0ASubject: Re: computing conditional probabilities with Hadoop?=0A=0A=0A= =0AActually, it would be almost as useful to be able to have a "multi-reduc= e".=0A=0AIn such a system, you would specify multiple input/map pairs. The= reduce=0Afunction signature would then be something like:=0A=0A reduce(= WritableComparable key, OutputCollector, Reporter, Iterator ...)=0A=0AWhere= the output of each set of maps would be given its own iterator.=0A=0AI did= n't mention this alternative earlier because I figured it would be a=0Amuch= bigger leap than just ordering the reduce values. It would, however,=0Abe= very useful when it comes to co-grouping operations.=0A=0A=0AOn 10/1/07 6:= 17 PM, "Ted Dunning" wrote:=0A=0A> =0A> This is a common requirement.=0A> = =0A> Left unchanged would be fine but is probably very hard to enforce beca= use of=0A> the many map tasks and some uncertainty about which maps finishe= d first.=0A> Similarly useful would be the ability to require a particular = sort ordering=0A> on reduce values.=0A> =0A> =0A> On 10/1/07 6:05 PM, "Chri= s Dyer" wrote:=0A> =0A>> Does anyone know if Hadoop guarantees (can be mad= e to guarantee) that the=0A>> relative order of keys that are equal will be= left unchanged?=0A> =0A=0A ------=_20071001232437_74792--