hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Discussion for memory and scalability issues of Graph package.
Date Thu, 31 Jan 2013 08:12:11 GMT
If I remember correctly, in hadoop's case, MR framework merges and
sorts intermediate data files by key between map and reduce functions.
If we provide this function, I think we can solve disk queue,
message-grouping and message-sort at once.

BTW, can we specify the queue type per job?

On Thu, Jan 31, 2013 at 4:20 PM, Suraj Menon <surajsmenon@apache.org> wrote:
> Thanks for bringing up our discussion online.
>
> For 1. Let's implement something withing bsp-core that could be re-used by
> graph package. [HAMA-724]
>
> For 2. For sorted queue, It would be expensive to do all the sorting on the
> sender side. We need to have a send protocol and the receive protocol
> (merge sort) [HAMA-722][HAMA-723]
>
> Regards,
> Suraj
>
> On Wed, Jan 30, 2013 at 3:05 AM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>
>> Hi devs,
>>
>> As you know, many people reports OOM problems with graph algorithms.
>> It is about handling messages. I roughly think that every vertex can
>> send or receive as many messages as the number of outgoing or incoming
>> links. For example, you know, Barack Obama has an 26,000,000+
>> followers.
>>
>> I believe the issue of message queue will be fixed by adding spilling
>> queue. Another issue is the grouping messages by vertex ID[1]. To
>> solve this issue, I'm thinking about two ways: 1) Support grouping
>> function of key-value pair messages in BSP framework (like
>> Map/Reduce). 2) Write messages and Sort by vertex ID on local disk
>> (external merge sort).
>>
>> If you have any ideas or suggestions, Pls let me know.
>>
>> 1. https://issues.apache.org/jira/browse/HAMA-704
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message