hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-723) Implement sorting in spilling queue.
Date Tue, 26 Feb 2013 15:16:13 GMT

    [ https://issues.apache.org/jira/browse/HAMA-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587187#comment-13587187

Thomas Jungblut commented on HAMA-723:

Oh great.

Personally I would totally rethink the messaging. You earlier mentioned the 16k buffer that
is getting sorted using Quicksort.
I think this is the way to go, we should materialize messages once they are send() to a DataOutputBuffer
(there are N-buffers for each outgoing peer, lazily initialized), once a threshold is exceeded
(with Hadoop RPC's overhead, I guess 4mb should be optimal?) we sort it, apply compression
if defined and send it via RPC. The normal spilling queue works the same way, but without

This implies we are removing the bundling which is okay in my opinion. The receiver side should
know sorted segments are arriving and merge the data into a single file on local disk.
In the same part we are adding asynchronous messaging, as when a buffer is exceeded the data
goes over the wire.
Also we should enforce that algorithms keep using the same message class as it makes it easier
for us to keep a single instance and stop writing classnames the whole time.

That is going to be a huge patch, should we chunk that?
> Implement sorting in spilling queue.
> ------------------------------------
>                 Key: HAMA-723
>                 URL: https://issues.apache.org/jira/browse/HAMA-723
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>            Reporter: Suraj Menon
>            Assignee: Edward J. Yoon
>             Fix For: 0.6.1, 0.7.0
> Implement sorted queue. The sender queue can send segments of sorted data and the receiver
queue should implement merge sort.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message