hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
Date Sat, 12 Apr 2014 00:25:29 GMT
No .. Please read my mail again. One task creates the topology map and
broadcast to all peers at first super step.

MapWritable<GroupName, List<HostName>> topology;
..

On Sat, Apr 12, 2014 at 3:16 AM, Chia-Hung Lin <clin4j@googlemail.com> wrote:
> No problem. It's a good discussion so we can examine and improve accordingly.
>
> I am still not very sure about the topology, or how tasks are grouped.
> From description, it seems looks as the link below:
>
> http://i.imgur.com/92L2XY1.png
>
> Each GroomServer is viewed as a group, and each group will launch 3
> tasks by default (as default xml defined). So the corresponded
> messages, emitted from source like queue, is sent to each group for
> consumption? And how do task communicate between groups/ tasks?
>
>
>
>
> On 11 April 2014 16:43, Edward J. Yoon <edward@datasayer.com> wrote:
>> My rough idea assumes that dedicated Hama is installed on machines that
>> generates logs, and the number of child tasks will be launched equally per
>> GroomServer. So, if the groups == 3, framework launches 3 tasks per node.
>> At first superstep, one task broadcasts the Topology after grouping the
>> Tasks into 3 groups.
>>
>> == Group1 ==
>> server1:60001
>> server2:60001
>> server3:60001
>>
>> == Group2 ==
>> server1:60002
>> server2:60002
>> server3:60002
>>
>> == Group3 ==
>> server1:60003
>> server2:60003
>> server3:60003
>>
>> Based on this Topolgy, tasks reflects proper class and executes it. Then,
>> it'll work like Storm flow. I didn't think about FT issue yet. :-)
>>
>>
>>
>> On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <clin4j@googlemail.com>wrote:
>>
>>> Or we can have POC first and then see how it relates to the issue we
>>> might need to fix.
>>>
>>> On 11 April 2014 16:10, Chia-Hung Lin <clin4j@googlemail.com> wrote:
>>> > In that case are we going to organize multiple tasks into a group? A
>>> > job has N bsp groups (bsp task in current code), in turn each group
>>> > contain multiple tasks (and all tasks are on the same server)?
>>> >
>>> > If this is the case, how do they send messages or communicate between
>>> > groups? group to group? A task (within a group) can arbitrary send the
>>> > messages?
>>> >
>>> > I have this question because this would have implication on FT. IIRC
>>> > Storm is a CEP framework, and messages can be sent arbitrary to every
>>> > bolt. The issue with such computation is that it's not a simple task
>>> > when performing checkpoint. Generally it's done through communication
>>> > induced checkpointing. Otherwise like storm they ack and redo each
>>> > message when necessary; an option is something like batch (in storm
>>> > like trident batch processing if I am correct) transactional
>>> > processing.
>>> >
>>> > What I can think of right now is, with current structure, grouping
>>> > every N messages a superstep, and then asynchronously checkpointing,
>>> > which may be similar to trident batch processing.
>>> >
>>> > I understand it's still far away based on the current status. I
>>> > suppose it's good if we can take that into consideration beforehand as
>>> > well.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On 11 April 2014 13:40, Edward J. Yoon <edwardyoon@apache.org> wrote:
>>> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable
>>> >> bolts seems pretty nice (especially, chainable bolts can be really
>>> >> useful in case of real-time join operation).
>>> >>
>>> >> I think, we can also implement similar functions of Storm's task
>>> >> grouping and chainable bolts on BSP. My rough idea is:
>>> >>
>>> >> 1. Launches multi-tasks per node (as number of group of Bolts). For
>>> example:
>>> >>
>>> >> +---------------+
>>> >> |    Server1    |
>>> >> +---------------+
>>> >> Task-1. tailing bolt
>>> >> Task-2. split sentence bolt
>>> >> Task-3. wordcount bolt
>>> >>
>>> >> 2. Assign the tasks to proper group.
>>> >> --
>>> >> 3. Each task executes their user-defined function and sends messages
>>> >> to task of next group.
>>> >> 4. Synchronizes all.
>>> >> --
>>> >> 5. Finally, repeat the above 3 ~ 4 process.
>>> >>
>>> >> In here, only the difficult one is how to determine the task group at
>>> >> initial superstep. So, I'd like to add below one to BSPPeer interface.
>>> >>
>>> >>   /**
>>> >>    * @return the names of locally adjacent peers (including this peer).
>>> >>    */
>>> >>   public String[] getAdjacentPeerNames();
>>> >>
>>> >>
>>> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <yexijiang@gmail.com>
>>> wrote:
>>> >>> great~
>>> >>>
>>> >>>
>>> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <jira@apache.org>:
>>> >>>
>>> >>>>
>>> >>>>     [
>>> >>>>
>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430
>>> ]
>>> >>>>
>>> >>>> Edward J. Yoon commented on HAMA-883:
>>> >>>> -------------------------------------
>>> >>>>
>>> >>>> NOTE: my fellow worker is currently working on this issue -
>>> >>>> https://github.com/garudakang/meerkat
>>> >>>>
>>> >>>> > [Research Task] Massive log event aggregation in real time
using
>>> Apache
>>> >>>> Hama
>>> >>>> >
>>> >>>>
>>> ----------------------------------------------------------------------------
>>> >>>> >
>>> >>>> >                 Key: HAMA-883
>>> >>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>>> >>>> >             Project: Hama
>>> >>>> >          Issue Type: Task
>>> >>>> >            Reporter: Edward J. Yoon
>>> >>>> >
>>> >>>> > BSP tasks can be used for aggregating log data streamed
in real
>>> time.
>>> >>>> With this research task, we might able to platformization these
kind
>>> of
>>> >>>> processing.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> This message was sent by Atlassian JIRA
>>> >>>> (v6.2#6252)
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> ------
>>> >>> Yexi Jiang,
>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>> >>> School of Computer and Information Science,
>>> >>> Florida International University
>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Edward J. Yoon (@eddieyoon)
>>> >> Chief Executive Officer
>>> >> DataSayer Co., Ltd.
>>>
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer Co., Ltd.



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer Co., Ltd.

Mime
View raw message