hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama
Date Wed, 05 Mar 2014 02:32:44 GMT
I'm thinking about coupling with ML (incremental) algorithms.

On Wed, Mar 5, 2014 at 11:16 AM, Yexi Jiang <yexijiang@gmail.com> wrote:
> I have ever implemented a system monitor/log collector using ActiveMQ and a
> real time anomaly detection algorithm on top of Twitter's Storm. I think
> people like me may naturally choose such streaming computing framework to
> handle this scenario.
>
> For real time computation, what is the unique characteristics of Hama that
> make people choose it instead of Storm? In my humble opinion, one unique
> characteristic of Hama is that it provides a general BSP computing
> framework (compared with Giraph, who provide a specific BSP only for graph
> computing). No one else has such ability.
>
>
> 2014-03-04 21:02 GMT-05:00 Edward J. Yoon <edwardyoon@apache.org>:
>
>> The final goal can be a real-time event processing framework for
>> distributed event detection, filtering, and aggregation. I guess that
>> can be done with only 3 components:
>>
>>  * Event processing job configuration interface.
>>  * User-defined function that handles the stream input.
>>  * Master Aggregator(s) and its client library.
>>
>> I expect this can be applied such as web clickstream log analysis
>> (large scale web servers), finding hot search keywords, detecting
>> system errors in real time, and user will be able to program them in
>> few minutes.
>>
>>
>> On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang <yexijiang@gmail.com> wrote:
>> > Please correct me if I'm wrong. My understanding of aggregating the log
>> is
>> > the collect the generated from each monitored machine in real time. The
>> > collecting procedure is continuous like a data stream and never end.
>> >
>> > I know how to use Hama to aggregate the logs batch by batch (e.g.
>> aggregate
>> > the logs incrementally each day), but I cannot immediately make up an
>> idea
>> > of using Hama to solve this problem in real time approach.
>> >
>> >
>> > 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <edwardyoon@apache.org>:
>> >
>> >> Aggregators of Graph package are doing similar wok. Monitoring and
>> >> Global communication, ..., etc.
>> >>
>> >>
>> >>
>> >> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <yexijiang@gmail.com>
>> wrote:
>> >> > I am very interested in this topic since my research area includes
>> event
>> >> > mining, but can BSP conducts the real time computing?
>> >> >
>> >> > I once used the message queue based solution to collect the event
>> logs.
>> >> >
>> >> >
>> >> > 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <jira@apache.org>:
>> >> >
>> >> >>
>> >> >>      [
>> >> >>
>> >>
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >> ]
>> >> >>
>> >> >> Edward J. Yoon updated HAMA-883:
>> >> >> --------------------------------
>> >> >>
>> >> >>     Summary: [Research Task] Massive log event aggregation in real
>> time
>> >> >> using Apache Hama  (was: [Research Task] Massive log data
>> aggregation in
>> >> >> real time using Apache Hama)
>> >> >>
>> >> >> > [Research Task] Massive log event aggregation in real time
using
>> >> Apache
>> >> >> Hama
>> >> >> >
>> >> >>
>> >>
>> ----------------------------------------------------------------------------
>> >> >> >
>> >> >> >                 Key: HAMA-883
>> >> >> >                 URL:
>> https://issues.apache.org/jira/browse/HAMA-883
>> >> >> >             Project: Hama
>> >> >> >          Issue Type: Task
>> >> >> >            Reporter: Edward J. Yoon
>> >> >> >
>> >> >> > BSP tasks can be used for aggregating log data streamed in
real
>> time.
>> >> >> With this research task, we might able to platformization these
kind
>> of
>> >> >> processing.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> This message was sent by Atlassian JIRA
>> >> >> (v6.2#6252)
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > ------
>> >> > Yexi Jiang,
>> >> > ECS 251,  yjian004@cs.fiu.edu
>> >> > School of Computer and Information Science,
>> >> > Florida International University
>> >> > Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>
>> >>
>> >>
>> >> --
>> >> Edward J. Yoon (@eddieyoon)
>> >> Chief Executive Officer
>> >> DataSayer, Inc.
>> >>
>> >
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Mime
View raw message