Return-Path: X-Original-To: apmail-incubator-hama-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-hama-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1712779B1 for ; Fri, 9 Dec 2011 19:03:35 +0000 (UTC) Received: (qmail 25409 invoked by uid 500); 9 Dec 2011 19:03:34 -0000 Delivered-To: apmail-incubator-hama-user-archive@incubator.apache.org Received: (qmail 25392 invoked by uid 500); 9 Dec 2011 19:03:34 -0000 Mailing-List: contact hama-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-user@incubator.apache.org Delivered-To: mailing list hama-user@incubator.apache.org Received: (qmail 25384 invoked by uid 99); 9 Dec 2011 19:03:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Dec 2011 19:03:34 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mspreitz@us.ibm.com designates 32.97.182.138 as permitted sender) Received: from [32.97.182.138] (HELO e8.ny.us.ibm.com) (32.97.182.138) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Dec 2011 19:03:23 +0000 Received: from /spool/local by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 9 Dec 2011 14:03:03 -0500 Received: from d01relay06.pok.ibm.com (9.56.227.116) by e8.ny.us.ibm.com (192.168.1.108) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 9 Dec 2011 14:02:30 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay06.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pB9J2NLX3362880 for ; Fri, 9 Dec 2011 14:02:23 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pB9J2Nsm018428 for ; Fri, 9 Dec 2011 14:02:23 -0500 Received: from d01ml604.pok.ibm.com (d01ml604.pok.ibm.com [9.56.227.90]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id pB9J2MD5018344 for ; Fri, 9 Dec 2011 14:02:22 -0500 In-Reply-To: References: <4EE1B704.2000600@apache.org> To: hama-user@incubator.apache.org MIME-Version: 1.0 Subject: Re: Comparing BSP and MR X-KeepSent: C3C0309E:88FA3E61-85257961:0067BD10; type=4; name=$KeepSent X-Mailer: Lotus Notes Release 8.5.1 September 28, 2009 From: Mike Spreitzer Message-ID: Date: Fri, 9 Dec 2011 14:02:19 -0500 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Release 8.5.3|September 15, 2011) at 12/09/2011 14:02:21, Serialize complete at 12/09/2011 14:02:21 Content-Type: multipart/alternative; boundary="=_alternative 006893DC85257961_=" x-cbid: 11120919-9360-0000-0000-00000162A398 X-Virus-Checked: Checked by ClamAV on apache.org --=_alternative 006893DC85257961_= Content-Type: text/plain; charset="US-ASCII" It looks to me like the original questions, and the ensuing discussion, are a mix of two concerns. (1) BSP has some abstractions that have been realized in many concrete systems, and these abstractions can be compared to MapReduce independently of the details of particular implementations. (2) is asking for details of the implementations in Giraph and Hama. I think it is very valuable to separate these. Even for concrete systems like Hama and Giraph, the APIs are sufficiently abstract to allow quite a variety of implementations. Regarding the comparison of abstractions, the first thing to note is that MapReduce omits two important concepts from BSP, namely iteration and per-component state. To make a very meaningful comparison, you have to compare BSP with an iterated use of MapReduce. Imagine running the same MapReduce job over and over again, with the output of one iteration being the input to the next. In this case you can pair up each reduce task (except those of the final iteration) of one iteration with a map task of the following iteration --- the one that reads the reduce task's output. It is that pair that corresponds to a "compute" invocation in BSP. Note that in iterated MapReduce there are two synchronization barriers per iteration: one between map and reduce, and one between iterations. BSP has just one barrier per iteration. Iterated MapReduce externalizes the intermediate data that flows from one iteration's reduce tasks to the next iteration's map tasks; in the corresponding BSP computation this intermediate data is just held inside the compute invocations. Regards, Mike --=_alternative 006893DC85257961_=--