From hadoop-user-return-2428-apmail-lucene-hadoop-user-archive=lucene.apache.org@lucene.apache.org Tue Oct 02 02:33:11 2007 Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 73331 invoked from network); 2 Oct 2007 02:33:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Oct 2007 02:33:10 -0000 Received: (qmail 48189 invoked by uid 500); 2 Oct 2007 02:32:59 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 48163 invoked by uid 500); 2 Oct 2007 02:32:58 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 48154 invoked by uid 99); 2 Oct 2007 02:32:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2007 19:32:58 -0700 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2007 02:32:59 +0000 Received: from 206.169.1.36 ([206.169.1.36]) by ex9.hostedexchange.local ([69.50.2.13]) with Microsoft Exchange Server HTTP-DAV ; Tue, 2 Oct 2007 02:32:18 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Mon, 01 Oct 2007 19:32:06 -0700 Subject: Re: computing conditional probabilities with Hadoop? From: Ted Dunning To: Message-ID: Thread-Topic: computing conditional probabilities with Hadoop? Thread-Index: AcgEkgvQSnraPHCFEdy7hwAWy8rVfQACmCUn In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Actually, it would be almost as useful to be able to have a "multi-reduce". In such a system, you would specify multiple input/map pairs. The reduce function signature would then be something like: reduce(WritableComparable key, OutputCollector, Reporter, Iterator ...) Where the output of each set of maps would be given its own iterator. I didn't mention this alternative earlier because I figured it would be a much bigger leap than just ordering the reduce values. It would, however, be very useful when it comes to co-grouping operations. On 10/1/07 6:17 PM, "Ted Dunning" wrote: > > This is a common requirement. > > Left unchanged would be fine but is probably very hard to enforce because of > the many map tasks and some uncertainty about which maps finished first. > Similarly useful would be the ability to require a particular sort ordering > on reduce values. > > > On 10/1/07 6:05 PM, "Chris Dyer" wrote: > >> Does anyone know if Hadoop guarantees (can be made to guarantee) that the >> relative order of keys that are equal will be left unchanged? >