hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HAMA-990) GSoC'16: Apache Hama benchmark against Spark and Flink
Date Fri, 20 May 2016 01:24:13 GMT

    [ https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292508#comment-15292508
] 

Edward J. Yoon edited comment on HAMA-990 at 5/20/16 1:23 AM:
--------------------------------------------------------------

{code}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page Rank and Query
Processing whereas Spark is faster in Word Count. We can reproduce these results in our cluster
and then can calculate the results for Hama. Once we have all the results we can compare all
the systems.
{code}

I think good idea. With this, we may able to derive insight from the results (this should
be our goal). I think I heard that flink uses own serialization techniques and shows good
performance but unstable. Just FYI, MRQL also can be used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only few high-end
machines equipped gpu and so somewhat not fit for large-scale distributed computing benchmark.
If you can write some scripts that make it possible to auto-produce benchmark results on clouds
such as Amazon or Google cloud, I can help.



was (Author: udanax):
{qoute}
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page Rank and Query
Processing whereas Spark is faster in Word Count. We can reproduce these results in our cluster
and then can calculate the results for Hama. Once we have all the results we can compare all
the systems.
{qoute}

I think good idea. With this, we may able to derive insight from the results (this should
be our goal). I think I heard that flink uses own serialization techniques and shows good
performance but unstable. Just FYI, MRQL also can be used for K-Means and PageRank.

Regarding cluster, current my cluster (used for my research) is consist of only few high-end
machines equipped gpu and so somewhat not fit for large-scale distributed computing benchmark.
If you can write some scripts that make it possible to auto-produce benchmark results on clouds
such as Amazon or Google cloud, I can help.


> GSoC'16: Apache Hama benchmark against Spark and Flink
> ------------------------------------------------------
>
>                 Key: HAMA-990
>                 URL: https://issues.apache.org/jira/browse/HAMA-990
>             Project: Hama
>          Issue Type: Documentation
>            Reporter: Behroz Sikander
>            Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message