hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chia-Hung Lin <cli...@googlemail.com>
Subject Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
Date Fri, 11 Apr 2014 08:12:28 GMT
Or we can have POC first and then see how it relates to the issue we
might need to fix.

On 11 April 2014 16:10, Chia-Hung Lin <clin4j@googlemail.com> wrote:
> In that case are we going to organize multiple tasks into a group? A
> job has N bsp groups (bsp task in current code), in turn each group
> contain multiple tasks (and all tasks are on the same server)?
>
> If this is the case, how do they send messages or communicate between
> groups? group to group? A task (within a group) can arbitrary send the
> messages?
>
> I have this question because this would have implication on FT. IIRC
> Storm is a CEP framework, and messages can be sent arbitrary to every
> bolt. The issue with such computation is that it's not a simple task
> when performing checkpoint. Generally it's done through communication
> induced checkpointing. Otherwise like storm they ack and redo each
> message when necessary; an option is something like batch (in storm
> like trident batch processing if I am correct) transactional
> processing.
>
> What I can think of right now is, with current structure, grouping
> every N messages a superstep, and then asynchronously checkpointing,
> which may be similar to trident batch processing.
>
> I understand it's still far away based on the current status. I
> suppose it's good if we can take that into consideration beforehand as
> well.
>
>
>
>
>
> On 11 April 2014 13:40, Edward J. Yoon <edwardyoon@apache.org> wrote:
>> Yesterday, I had survey the Storm. Storm's task grouping and chainable
>> bolts seems pretty nice (especially, chainable bolts can be really
>> useful in case of real-time join operation).
>>
>> I think, we can also implement similar functions of Storm's task
>> grouping and chainable bolts on BSP. My rough idea is:
>>
>> 1. Launches multi-tasks per node (as number of group of Bolts). For example:
>>
>> +---------------+
>> |    Server1    |
>> +---------------+
>> Task-1. tailing bolt
>> Task-2. split sentence bolt
>> Task-3. wordcount bolt
>>
>> 2. Assign the tasks to proper group.
>> --
>> 3. Each task executes their user-defined function and sends messages
>> to task of next group.
>> 4. Synchronizes all.
>> --
>> 5. Finally, repeat the above 3 ~ 4 process.
>>
>> In here, only the difficult one is how to determine the task group at
>> initial superstep. So, I'd like to add below one to BSPPeer interface.
>>
>>   /**
>>    * @return the names of locally adjacent peers (including this peer).
>>    */
>>   public String[] getAdjacentPeerNames();
>>
>>
>> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <yexijiang@gmail.com> wrote:
>>> great~
>>>
>>>
>>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <jira@apache.org>:
>>>
>>>>
>>>>     [
>>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430]
>>>>
>>>> Edward J. Yoon commented on HAMA-883:
>>>> -------------------------------------
>>>>
>>>> NOTE: my fellow worker is currently working on this issue -
>>>> https://github.com/garudakang/meerkat
>>>>
>>>> > [Research Task] Massive log event aggregation in real time using Apache
>>>> Hama
>>>> >
>>>> ----------------------------------------------------------------------------
>>>> >
>>>> >                 Key: HAMA-883
>>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>>>> >             Project: Hama
>>>> >          Issue Type: Task
>>>> >            Reporter: Edward J. Yoon
>>>> >
>>>> > BSP tasks can be used for aggregating log data streamed in real time.
>>>> With this research task, we might able to platformization these kind of
>>>> processing.
>>>>
>>>>
>>>>
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.2#6252)
>>>>
>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer Co., Ltd.

Mime
View raw message