incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (Commented) (JIRA)" <>
Subject [jira] [Commented] (HAMA-540) Create distributed sort BSP
Date Fri, 13 Apr 2012 11:57:16 GMT


Thomas Jungblut commented on HAMA-540:

bq.1) It would be good to have the initial (p-1) pivots evenly distributed over the input
data. This way the processors take almost same time to complete the task. How are you planning
to get them?

Yes that is the major point in this algorithm, I use the same naive way to get the pivots.
There is this median in linear time algorithm from Tarjan and Rivest (
However noone is going to iterative over 1TB of data to determine a good pivot.
When you have big data the chances that the data is evenly distributed over the processes
by picking the first n-elements is quite high.

bq. 2) Slightly less than double the size of the input data is passed between the processors
in the sampling sort. If the input is 1TB, then ~2TB is transferred between the nodes. Not
sure if this is OK? It would be nice to know how much data is transferred in merge sort and
quick sort.

That is the key why I think this is not very efficient.

bq. I am just getting a feel of Hama/BSP and thought of implementing sampling sort. Please
do it. I will pick the next easy one 

I'm sorry! You can also implement sampling sort, maybe you interpret the paper other than
> Create distributed sort BSP
> ---------------------------
>                 Key: HAMA-540
>                 URL:
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp, examples
>            Reporter: Thomas Jungblut
> For HAMA-535 we need some kind of sort framework, for various other tasks this could
be as well practical.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message