Return-Path: X-Original-To: apmail-hama-user-archive@www.apache.org Delivered-To: apmail-hama-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE3F8184A2 for ; Fri, 31 Jul 2015 19:18:12 +0000 (UTC) Received: (qmail 23688 invoked by uid 500); 31 Jul 2015 19:18:12 -0000 Delivered-To: apmail-hama-user-archive@hama.apache.org Received: (qmail 23660 invoked by uid 500); 31 Jul 2015 19:18:12 -0000 Mailing-List: contact user-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hama.apache.org Delivered-To: mailing list user@hama.apache.org Received: (qmail 23648 invoked by uid 99); 31 Jul 2015 19:18:12 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2015 19:18:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 00C481A9419 for ; Fri, 31 Jul 2015 19:18:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.901 X-Spam-Level: *** X-Spam-Status: No, score=3.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, KAM_INFOUSMEBIZ=0.75, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id bNEgZ1XPmk08 for ; Fri, 31 Jul 2015 19:17:57 +0000 (UTC) Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id E97AD24E0F for ; Fri, 31 Jul 2015 19:17:56 +0000 (UTC) Received: by wicgj17 with SMTP id gj17so30298354wic.1 for ; Fri, 31 Jul 2015 12:17:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=spdq6POS3eB+e7lROrGYzshv1f9F4UVxthR/nGixUaU=; b=CmxWvXN6zQV/Ybqz3fDNDz6Jvgv/zt7dg5uPd0vORmA40solc04dPfGoTuvVmRxe/1 W2udAR6FZFCcN9AJ17x3wlcB+cUJZNRcoAJ6zAE/7t7XDLkdTbYCVOh6ihu0hx1GUagt Tprf67+TAMpYeTU+GLujhWVLC5VT465e7ZtAFc9KPzNEqwHeuVQQs/ptvaQ99gO/n5AN pwMenKHsOY+Y2jjX4cWBcP/Bi7vebJL4cwFi3fd3hXPtHJ0CT5Dnc9aC6oh0JVqokLet uIkycFeugjsUKr91wvV/bX9dOZaIPvREcuJV6eZPOQAe3NffvwsEVKP+fRFhOH8/D+NR wcoQ== X-Received: by 10.180.80.138 with SMTP id r10mr9081476wix.18.1438370276654; Fri, 31 Jul 2015 12:17:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.27.13.137 with HTTP; Fri, 31 Jul 2015 12:17:37 -0700 (PDT) In-Reply-To: References: <001c01d0aed7$fcf6f8b0$f6e4ea10$@samsung.com> From: Behroz Sikander Date: Fri, 31 Jul 2015 21:17:37 +0200 Message-ID: Subject: Re: Hama vs Spark To: user@hama.apache.org Content-Type: multipart/alternative; boundary=f46d0444038a6545ec051c30ad02 --f46d0444038a6545ec051c30ad02 Content-Type: text/plain; charset=UTF-8 +1. This is great. Btw our current implementation of Hama is Synchronous BSP i.e we have to wait for the slowest machine to sync in order to move to the next super step. Is there anything like Asynchronous BSP out yet ? If yes, do you have plans to add it to this framework ? Regards, Behroz On Wed, Jul 29, 2015 at 3:12 AM, Edward J. Yoon wrote: > I found research paper somewhat related with this topic. > > "Both the disk based method, i.e., MR, and the memory based method, > i.e., BSP and Spark, need to load the data into main memory and > conduct the expensive computation. However, when processing topk > joins, BSP is clearly the best method as it is the only one that is > able to perform top-k joins on large datasets. This is because BSP > supports the frequent synchronizations between workers when performing > the joining procedure, which quickly lowers the joining threshold for > a given k. The winner between the MR and the Spark algorithms change > from datasets to datasets: Spark is beaten by MR on A and B while > beats MR on C." - > http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf > > On Thu, Jun 25, 2015 at 9:02 PM, Behroz Sikander > wrote: > > Hi all, > > *>>Apache Spark is definitely more suited for ML (iterative algorithms) > > than* > > > > > > *legacy Hadoop due to its preservation of state and optimized > > executionstrategy (RDDs). However, their approaches are still in > > synchronous iterativecommunication pattern.* > > So, Hama has a better communication model. That is a good point. > > > > *>>Moreover, BSP can have virtual **shared memory and many more > benefits.* > > I read somewhere that Spark has shared variables. BSP virtual shared > memory > > is something else or is it like shared variables in Spark ? > > > > *>>In addition, another one convincing* > > > > *point I think can be a utilization ability of modern acceleration > > accessoriessuch as InfiniBand and GPUs* > > Yes, it is a good point but I found the following link. Apparently, Spark > > is also capable of doing processing on GPU's. > > > https://spark-summit.org/east-2015/talk/heterospark-a-heterogeneous-cpugpu-spark-platform-for-deep-learning-algorithms-2 > > > > *>>I'm sure that this feature will bring a* > > > > *completely new wave of big data. The problem we faced is only a lack > > ofinterest to BSP programming model. :-)* > > My knowledge is quite limited but I think you are right. With the rise of > > IoT and stream processing, GPU's will become vital. Yes, I do not > > understand that why BSP is not the programming model of choice now a > days. > > It has a strong theoretical background which was proposed decades back > and > > still MapReduce/Spark models are used. > > > > > > *>>Just FYI, one of my friends said after reading this thread, "if > > AmazonEC2 = MR or BSP, Google App Engine = Spark". Maybe usability side.* > > I have not written a Spark job before, but I have seen the code. BSP > looks > > more intuitive to me somehow. > > > > *>>Hama = GraphX (Library of Spark (Pregel model) [1])* > > The graph module of Hama is definitely equal to GraphX of Spark. > > > > Regards, > > Behroz > > > > On Thu, Jun 25, 2015 at 1:46 AM, Edward J. Yoon > > > wrote: > > > >> Hi, here's my few thoughts. > >> > >> Apache Spark is definitely more suited for ML (iterative algorithms) > than > >> legacy Hadoop due to its preservation of state and optimized execution > >> strategy (RDDs). However, their approaches are still in synchronous > >> iterative > >> communication pattern. > >> > >> In Apache Hama case, it's a general-purpose pure BSP framework. While I > >> admit > >> that synchronization costs are high, the communication can be more > >> efficiently > >> realized with the message-passing BSP model. Moreover, BSP can have > virtual > >> shared memory and many more benefits. In addition, another one > convincing > >> point I think can be a utilization ability of modern acceleration > >> accessories > >> such as InfiniBand and GPUs. I'm sure that this feature will bring a > >> completely new wave of big data. The problem we faced is only a lack of > >> interest to BSP programming model. :-) > >> > >> > 2) Do we have any recent benchmarks between the 2 systems ? > >> > >> It's in my todo list. > >> > >> -- > >> Best Regards, Edward J. Yoon > >> > >> -----Original Message----- > >> From: Behroz Sikander [mailto:behroz89@gmail.com] > >> Sent: Thursday, June 25, 2015 12:57 AM > >> To: user@hama.apache.org > >> Subject: Hama vs Spark > >> > >> Hi, > >> A few days back, I started reading about Apache Spark. It is a pretty > good > >> BigData platform. But a question arises to my mind that where Hama lies > in > >> comparison with Spark if we have to implement an iterative algorithm > which > >> is compute intensive (Machine learning or Optimization) ? > >> > >> I found some resources online but none answers my questions. > >> > >> 1)BSP vs MapReduce paper > >> 2) > >> > >> > https://people.apache.org/~edwardyoon/documents/Hama_BSP_for_Advanced_Analytics.pdf > >> 3) I actually found the following benchmark but it is quite old. > >> > >> > >> > http://markmail.org/message/vyjsdpv355kua7rm#query:+page:1+mid:vstgda4fhmz52pdw+state:results > >> > >> Questions: > >> 1) Is there any specific advantage when we chose BSP model instead of > SPARK > >> paradigm ? > >> 2) Do we have any recent benchmarks between the 2 systems ? > >> 3) What is the main convincing point to use Hama over Spark ? > >> 4) Any scientific paper that compares both systems ? (I was not able to > >> find any) > >> > >> Regards, > >> Behroz Sikander > >> > >> > >> > > > > -- > Best Regards, Edward J. Yoon > --f46d0444038a6545ec051c30ad02--