hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "praveen sripati (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-540) Create distributed sort BSP
Date Fri, 13 Apr 2012 11:47:17 GMT

    [ https://issues.apache.org/jira/browse/HAMA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253303#comment-13253303

praveen sripati commented on HAMA-540:

1) It would be good to have the initial (p-1) pivots evenly distributed over the input data.
This way the processors take almost same time to complete the task. How are you planning to
get them?

In the TeraSort#readPartitions looks like the first reducer # of lines are read from the input
data. Not an efficient way.


2) Slightly less than double the size of the input data is passed between the processors in
the sampling sort. If the input is 1TB, then ~2TB is transferred between the nodes. Not sure
if this is OK? It would be nice to know how much data is transferred in merge sort and quick


I am just getting a feel of Hama/BSP and thought of implementing sampling sort. Please do
it. I will pick the next easy one :)
> Create distributed sort BSP
> ---------------------------
>                 Key: HAMA-540
>                 URL: https://issues.apache.org/jira/browse/HAMA-540
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp, examples
>            Reporter: Thomas Jungblut
> For HAMA-535 we need some kind of sort framework, for various other tasks this could
be as well practical.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message