hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Re: Spark vs. Storm
Date Wed, 02 Jul 2014 20:07:26 GMT
Spark Streaming discretizes the stream by configurable intervals of no less
than 500Milliseconds. Therefore it is not appropriate for true real time
processing.So if you need to capture events in the low 100's of milliseonds
range or less than stick with Storm (at least for now).

If you can afford one second+ of latency then spark provides advantages of
interoperability with the other Spark components and capabilities.

2014-07-02 12:59 GMT-07:00 Shahab Yunus <shahab.yunus@gmail.com>:

> Not exactly. There are of course  major implementation differences and
> then some subtle and high level ones too.
> My 2-cents:
> Spark is in-memory M/R and it simulated streaming or real-time distributed
> process for large datasets by micro-batching. The gain in speed and
> performance as opposed to batch paradigm is in-memory buffering or batching
> (and I am here being a bit naive/crude in explanation.)
> Storm on the other hand, supports stream processing even at a single
> record level (known as tuple in its lingo.) You can do micro-batching on
> top of it as well (using Trident API which is good for state maintenance
> too, if your BL requires that). This is more applicable where you want
> control to a single record level rather than set, collection or batch of
> records.
> Having said that, Spark Streaming is trying to simulate Storm's extreme
> granular approach but as far as I recall, it still is built on top of core
> Spark (basically another level of abstraction over core Spark constructs.)
> So given this, you can pick the framework which is more attuned to your
> needs.
> On Wed, Jul 2, 2014 at 3:31 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com> wrote:
>>   Do these two projects do essentially the same thing? Is one better
>> than the other?

View raw message