Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C04B4DA10 for ; Mon, 20 Aug 2012 07:37:55 +0000 (UTC) Received: (qmail 24245 invoked by uid 500); 20 Aug 2012 07:37:51 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 23849 invoked by uid 500); 20 Aug 2012 07:37:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 23794 invoked by uid 99); 20 Aug 2012 07:37:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Aug 2012 07:37:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dechouxb@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Aug 2012 07:37:39 +0000 Received: by qady1 with SMTP id y1so3170214qad.14 for ; Mon, 20 Aug 2012 00:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=1+bdhTz8a5DPAfbY76CBegQEjc2Qp8oW43yWXvnl8EY=; b=vOrTdoCPWcwVgQSdX0gQPOrK27KO9Ljbh8cytZJs1NLdSRVCgTmXNzM9A4llLN9sHc j+ATvXm+WOuvqspwuiKApJx+03XCUBPB+VuJ7fg52qTY3CjvL5O9/aDnQ7EO3mFUubhX FXDwOvsdaffBjtvU7axJHsmAlrC/VQWd0X0jRjL0V79+sIUKfBwcsxDl0aw7mvyaLnaq McPz8RT/yqQLrE6BODcBIAQVMHWr5H7KFuOm+6K4MURyiRUx5KVTVeBkM1ELntlz0YwK 2NUfUPEwkZND3wvQiY7xhyt83AQ2jGpj0N6Fz30bFxamfD9F3aImaB0qJZtdP07awCYz i/Mw== MIME-Version: 1.0 Received: by 10.229.135.75 with SMTP id m11mr12074005qct.66.1345448239103; Mon, 20 Aug 2012 00:37:19 -0700 (PDT) Received: by 10.49.76.10 with HTTP; Mon, 20 Aug 2012 00:37:19 -0700 (PDT) In-Reply-To: References: Date: Mon, 20 Aug 2012 09:37:19 +0200 Message-ID: Subject: Re: Hadoop Real time help From: Bertrand Dechoux To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00248c6a66c25afe9704c7ad946a --00248c6a66c25afe9704c7ad946a Content-Type: text/plain; charset=ISO-8859-1 The terms are * ESP : http://en.wikipedia.org/wiki/Event_stream_processing * CEP : http://en.wikipedia.org/wiki/Complex_event_processing By the way, processing streams in real time tends toward being a pleonasm. MapReduce follows a batch architecture. You keep data until a given time. You then process everything. And at the end you provide all the results. Stream processing has by definition a more 'smooth' throughput. Each event is processed at a time and potentially each processing could lead to a result. I don't know any complete overview of such tools. Esper is well known in that space. FlumeBase was an attempt to do something similar (as far as I can tell). It shows how an ESP engine fits with log collection using a tool such as Flume. Then you also have other solutions which will allow you to scale such as Storm. A few people have already considered using Storm for scalability and Esper to do the real computation. Regards Bertrand On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes wrote: > Is there a "complete" overview of the tools that allow processing streams > of data in realtime? > > Or even better; what are the terms to google for? > > -- > Met vriendelijke groet, > Niels Basjes > (Verstuurd vanaf mobiel ) > Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" het > volgende: > > That's a good question. More and more people are talking about Hadoop Real >> Time. >> One key aspect of this question is whether we are talking about MapReduce >> or not. >> >> MapReduce greatly improves the response time of any data intensive jobs >> but it is still a batch framework with a noticeable latency. >> >> There is multiple ways to improve the latency : >> * ESP/CEP solutions (like Esper, FlumeBase, ...) >> * Big Table clones (like HBase ...) >> * YARN with a non MapReduce application >> * ... >> >> But it will really depend on the context and the definition of 'real >> time'. >> >> Regards >> >> Bertrand >> >> >> >> On Sun, Aug 19, 2012 at 5:44 PM, mahout user wrote: >> >>> Hello folks, >>> >>> >>> I am new to hadoop, I just want to get information that how hadoop >>> framework is usefull for real time service.?can any one explain me..? >>> >>> Thanks. >>> >> >> >> >> -- >> Bertrand Dechoux >> > -- Bertrand Dechoux --00248c6a66c25afe9704c7ad946a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
= * CEP : h= ttp://en.wikipedia.org/wiki/Complex_event_processing

By the way, processing streams in real time tends toward being a pleona= sm.

MapReduce follows a batch architecture. You keep data until a gi= ven time. You then process everything. And at the end you provide all the r= esults.
Stream processing has by definition a more 'smooth' throughput. Eac= h event is processed at a time and potentially each processing could lead t= o a result.

I don't know any complete overview of such tools. Esper is well known in that space.
FlumeBase was an attempt to do someth= ing similar (as far as I can tell).
It shows how an ESP engine fits with= log collection using a tool such as Flume.

Then you also have other= solutions which will allow you to scale such as Storm.
A few people have already considered using Storm for scalability and Esper = to do the real computation.

Regards

Bertrand

On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <niels@basj.= es> wrote:

Is there a "complete" overview = of the tools that allow processing streams of data in realtime?

Or even better; what are the terms to google for?

--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )

Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <= ;dechouxb@gmail.com= > het volgende:

That's a good question. More and more people are talking about Hadoop R= eal Time.
One key aspect of this question is whether we are talking abou= t MapReduce or not.

MapReduce greatly improves the response time of = any data intensive jobs but it is still a batch framework with a noticeable= latency.

There is multiple ways to improve the latency :
* ESP/CEP solutions = (like Esper, FlumeBase, ...)
* Big Table clones (like HBase ...)
* YA= RN with a non MapReduce application
* ...

But it will really depe= nd on the context and the definition of 'real time'.

Regards

Bertrand



On Su= n, Aug 19, 2012 at 5:44 PM, mahout user <mahoutuser@gmail.com> wrote:
Hello folks,


=A0=A0 I am new to = hadoop, I just want to get information that how hadoop framework is usefull= for real time service.?can any one explain me..?

Thanks.



--
Bertrand Dechoux



--
Bertrand Dechoux
--00248c6a66c25afe9704c7ad946a--