Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 34FC8200B86 for ; Sun, 18 Sep 2016 15:28:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3378E160AC3; Sun, 18 Sep 2016 13:28:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 79424160AC0 for ; Sun, 18 Sep 2016 15:28:21 +0200 (CEST) Received: (qmail 34858 invoked by uid 500); 18 Sep 2016 13:28:20 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 34843 invoked by uid 99); 18 Sep 2016 13:28:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Sep 2016 13:28:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7B95D2C0D56 for ; Sun, 18 Sep 2016 13:28:20 +0000 (UTC) Date: Sun, 18 Sep 2016 13:28:20 +0000 (UTC) From: "JongYoon Lim (JIRA)" To: dev@hama.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HAMA-983) Hama runner for DataFlow MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 18 Sep 2016 13:28:22 -0000 [ https://issues.apache.org/jira/browse/HAMA-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15500951#comment-15500951 ] JongYoon Lim edited comment on HAMA-983 at 9/18/16 1:27 PM: ------------------------------------------------------------ Hi, it took some time to understand Beam API, spark and flink runner for Beam. And it seems that Beam's transforms can be translated to Hama's API as follow. And BSP for dataflow could be similar to SuperstepBSP. (if I have misunderstandings, please correct me) BEAM -> HAMA ParDo -> Superstep Read.Bound -> RecordReader Writt.Bound -> RecordWriter Combine -> Combiner GroupByKey -> ? I'm about to start from batch mode first until Hama's streaming is ready. And I'll add sub-tasks for this soon. was (Author: seedengine): Hi, it takes some time to understand Beam API, spark and flink runner for Beam. And it seems that Beam's transforms can be translated to Hama's API as follow. And BSP for dataflow could be similar to SuperstepBSP. (if I have misunderstandings, please correct me) BEAM -> HAMA ParDo -> Superstep Read.Bound -> RecordReader Writt.Bound -> RecordWriter Combine -> Combiner GroupByKey -> ? I'm about to start from batch mode first until Hama's streaming is ready. And I'll add sub-tasks for this soon. > Hama runner for DataFlow > ------------------------ > > Key: HAMA-983 > URL: https://issues.apache.org/jira/browse/HAMA-983 > Project: Hama > Issue Type: Bug > Reporter: Edward J. Yoon > Labels: gsoc2016 > > As you already know, Apache Beam provides unified programming model for both batch and streaming inputs. > The APIs are generally associated with data filtering and transforming. So we'll need to implement some data processing runner like https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java > Also, implementing similarity join can be funny. According to http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf, Apache Hama is clearly winner among Apache Hadoop and Apache Spark. > Since it consists of transformation, aggregation, and partition computations, I think it's possible to implement using Apache Beam APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)