hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Spark vs. Storm
Date Wed, 02 Jul 2014 19:59:09 GMT
Not exactly. There are of course  major implementation differences and then
some subtle and high level ones too.

My 2-cents:

Spark is in-memory M/R and it simulated streaming or real-time distributed
process for large datasets by micro-batching. The gain in speed and
performance as opposed to batch paradigm is in-memory buffering or batching
(and I am here being a bit naive/crude in explanation.)

Storm on the other hand, supports stream processing even at a single record
level (known as tuple in its lingo.) You can do micro-batching on top of it
as well (using Trident API which is good for state maintenance too, if your
BL requires that). This is more applicable where you want control to a
single record level rather than set, collection or batch of records.

Having said that, Spark Streaming is trying to simulate Storm's extreme
granular approach but as far as I recall, it still is built on top of core
Spark (basically another level of abstraction over core Spark constructs.)

So given this, you can pick the framework which is more attuned to your

On Wed, Jul 2, 2014 at 3:31 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

>   Do these two projects do essentially the same thing? Is one better than
> the other?

View raw message