hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAMA-983) Hama runner for DataFlow
Date Mon, 15 Feb 2016 00:38:18 GMT
Edward J. Yoon created HAMA-983:
-----------------------------------

             Summary: Hama runner for DataFlow
                 Key: HAMA-983
                 URL: https://issues.apache.org/jira/browse/HAMA-983
             Project: Hama
          Issue Type: Bug
            Reporter: Edward J. Yoon


As you already know, Apache Beam provides unified programming model for both batch and streaming
inputs.

The APIs are generally associated with data filtering and transforming. So we'll need to implement
some data processing runner like https://github.com/dapurv5/MapReduce-BSP-Adapter/blob/master/src/main/java/org/apache/hama/mapreduce/examples/WordCount.java

Also, implementing similarity join can be funny. According to http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf,
Apache Hama is clearly winner among Apache Hadoop and Apache Spark.

Since it consists of transformation, aggregation, and partition computations, I think it's
possible to implement using Apache Beam APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message