hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-629) Improve RPC Scalability Part 2
Date Mon, 05 Nov 2012 18:30:11 GMT

    [ https://issues.apache.org/jira/browse/HAMA-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490796#comment-13490796

Thomas Jungblut commented on HAMA-629:

I'm reading into Erlang these days and they deal with this problem very interestingly (actually
very simple and naive).
They are building a hierachy of nodes in your cluster. A subcluster contains 10 nodes (maybe
the number of nodes in a rack?), they are completely interconnected with each other, so the
are holding a RPC or socket connection the whole time.

If messages need to be send outside the subcluster, they are sending it to an aggregator (or
multiple aggregators) and the aggregators will send to the real destination.

That's how they archieve scalability, certainly a lot of TCP connections are a big problem
there as well.
> Improve RPC Scalability Part 2
> ------------------------------
>                 Key: HAMA-629
>                 URL: https://issues.apache.org/jira/browse/HAMA-629
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
> There is a problem when all 1k peers would attempt to send to a single peer (let's say
a master task in a graph algorithm that aggregates). In this case the peer will start 1k-threads
which is using enourmous amount of memory. 
> I think we can coordinate the message sending either with Zookeeper or by using the task
id and do a smarter sending chain.
> By the last, I mean, that each task can start at a different offset in the peer array
to start sending messages to the other peers. But this won't solve the problem DDoS'ing a
single master task.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message