Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of dechouxb@gmail.com designates
 209.85.216.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADoiZqo2E84ePuuJwC1fWFV9FTxfZs7yhC3Q+m0uNQA1zbNKQg@mail.gmail.com>
References: 
 <CALCQx_-Adw8D3m=RJCs6eMiz1nWx_wiU7Wj58rRZrgH6x3NL5w@mail.gmail.com>
	<CAO6W-2fUMH0TvZ=_Dw-QHffo9=B2Ok177JiBc_31bFjJShP=4A@mail.gmail.com>
	<CADoiZqo2E84ePuuJwC1fWFV9FTxfZs7yhC3Q+m0uNQA1zbNKQg@mail.gmail.com>
Date: Mon, 20 Aug 2012 09:37:19 +0200
Message-ID: 
 <CAO6W-2ft4VXB1UUonC=PXsgY+hd_BDeXJkCZoAniqJxZyJ301g@mail.gmail.com>
Subject: Re: Hadoop Real time help
From: Bertrand Dechoux <dechouxb@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00248c6a66c25afe9704c7ad946a

--00248c6a66c25afe9704c7ad946a
Content-Type: text/plain; charset=ISO-8859-1

The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
* CEP : http://en.wikipedia.org/wiki/Complex_event_processing

By the way, processing streams in real time tends toward being a pleonasm.

MapReduce follows a batch architecture. You keep data until a given time.
You then process everything. And at the end you provide all the results.
Stream processing has by definition a more 'smooth' throughput. Each event
is processed at a time and potentially each processing could lead to a
result.

I don't know any complete overview of such tools.
Esper is well known in that space.
FlumeBase was an attempt to do something similar (as far as I can tell).
It shows how an ESP engine fits with log collection using a tool such as
Flume.

Then you also have other solutions which will allow you to scale such as
Storm.
A few people have already considered using Storm for scalability and Esper
to do the real computation.

Regards

Bertrand

On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <niels@basj.es> wrote:

> Is there a "complete" overview of the tools that allow processing streams
> of data in realtime?
>
> Or even better; what are the terms to google for?
>
> --
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <dechouxb@gmail.com> het
> volgende:
>
> That's a good question. More and more people are talking about Hadoop Real
>> Time.
>> One key aspect of this question is whether we are talking about MapReduce
>> or not.
>>
>> MapReduce greatly improves the response time of any data intensive jobs
>> but it is still a batch framework with a noticeable latency.
>>
>> There is multiple ways to improve the latency :
>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>> * Big Table clones (like HBase ...)
>> * YARN with a non MapReduce application
>> * ...
>>
>> But it will really depend on the context and the definition of 'real
>> time'.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <mahoutuser@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>


-- 
Bertrand Dechoux

--00248c6a66c25afe9704c7ad946a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

The terms are<br>* ESP : <a href=3D"http://en.wikipedia.org/wiki/Event_stre=
am_processing">http://en.wikipedia.org/wiki/Event_stream_processing</a><br>=
* CEP : <a href=3D"http://en.wikipedia.org/wiki/Complex_event_processing">h=
ttp://en.wikipedia.org/wiki/Complex_event_processing</a><br>
<br>By the way, processing streams in real time tends toward being a pleona=
sm.<br><br>MapReduce follows a batch architecture. You keep data until a gi=
ven time. You then process everything. And at the end you provide all the r=
esults.<br>
Stream processing has by definition a more &#39;smooth&#39; throughput. Eac=
h event is processed at a time and potentially each processing could lead t=
o a result.<br><br>I don&#39;t know any complete overview of such tools.<br=
>
Esper is well known in that space.<br>FlumeBase was an attempt to do someth=
ing similar (as far as I can tell).<br>It shows how an ESP engine fits with=
 log collection using a tool such as Flume.<br><br>Then you also have other=
 solutions which will allow you to scale such as Storm.<br>
A few people have already considered using Storm for scalability and Esper =
to do the real computation.<br><br>Regards<br><br>Bertrand<br><br><div clas=
s=3D"gmail_quote">On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <span dir=
=3D"ltr">&lt;<a href=3D"mailto:niels@basj.es" target=3D"_blank">niels@basj.=
es</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><p>Is there a &quot;complete&quot; overview =
of the tools that allow processing streams of data in realtime?</p>
<p>Or even better; what are the terms to google for?</p><span class=3D"HOEn=
Zb"><font color=3D"#888888">
<p>-- <br>
Met vriendelijke groet, <br>
Niels Basjes<br>
(Verstuurd vanaf mobiel )</p>
</font></span><div class=3D"gmail_quote"><span class=3D"HOEnZb"><font color=
=3D"#888888">Op 19 aug. 2012 18:22 schreef &quot;Bertrand Dechoux&quot; &lt=
;<a href=3D"mailto:dechouxb@gmail.com" target=3D"_blank">dechouxb@gmail.com=
</a>&gt; het volgende:</font></span><div>
<div class=3D"h5"><br type=3D"attribution"><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
That&#39;s a good question. More and more people are talking about Hadoop R=
eal Time.<br>One key aspect of this question is whether we are talking abou=
t MapReduce or not.<br><br>MapReduce greatly improves the response time of =
any data intensive jobs but it is still a batch framework with a noticeable=
 latency.<br>


<br>There is multiple ways to improve the latency :<br>* ESP/CEP solutions =
(like Esper, FlumeBase, ...)<br>* Big Table clones (like HBase ...)<br>* YA=
RN with a non MapReduce application<br>* ...<br><br>But it will really depe=
nd on the context and the definition of &#39;real time&#39;.<br>


<br>Regards<br><br>Bertrand<br><br><br><br><div class=3D"gmail_quote">On Su=
n, Aug 19, 2012 at 5:44 PM, mahout user <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:mahoutuser@gmail.com" target=3D"_blank">mahoutuser@gmail.com</a>&gt;</=
span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hello folks, <br><br><br>=A0=A0 I am new to =
hadoop, I just want to get information that how hadoop framework is usefull=
 for real time service.?can any one explain me..?<br>


<br>Thanks.<br>
</blockquote></div><br><br clear=3D"all"><br>-- <br>Bertrand Dechoux<br>
</blockquote></div></div></div>
</blockquote></div><br><br clear=3D"all"><br>-- <br>Bertrand Dechoux<br>

--00248c6a66c25afe9704c7ad946a--