storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svend Vanderveken <svend.vanderve...@gmail.com>
Subject Re: Strom research suggestions
Date Thu, 09 Jan 2014 15:46:31 GMT
Hey Tobias,


Nice project, I would have loved to play with something like storm back in
my university days :)

Here's a topic that's been on my mind for a while (Trident API of storm):


* one core idea of distributed map reduce à la hadoop was to perform as
much processing as possible close to the data: you execute the "map"
locally on each node where the data sits, you do a first reduce there, then
you let the result travel through the network, you do one last reduce
centrally and you have a result without having all your DB travel the
network everytime

* Storm groupBy + persistentAggregate + reducer/combiner let us have a
similar semantic, where we map incoming tuples, reduce them with other
tuples in the same group + with previously reduced value stored in DB at
regular interval

* for each group, the operation above happens always on the same Storm Task
(i.e. the same "place" in the cluster) and stores its ongoing state in the
"same place" in DB, using the group value as primary key

I believe it might be worth investigating if the following pattern would
make sense:

* install a distributed state store (e..g cassandra) on the same nodes as
the Storm workers

* try to align the Storm partitioning triggered by the groupby with
Cassandra partitioning, so that under usual happy circumstances (no crash),
the Storm reduction is happening on the node where Cassandra is storing
that particular primary key, avoiding the network travel for the
persistence.


What do you think? Premature optimization? Does not make sense? Great idea?
Let me know :)


S




On Thu, Jan 9, 2014 at 3:00 PM, Tobias Pazer <tobiaspazer@gmail.com> wrote:

> Hi all,
>
> I have recently started writing my master thesis with a focus on storm, as
> we are planning to implement the lambda architecture in our university.
>
> As it's still not very clear for me where exactly it's worth to dive into,
> I was hoping one of you might have any suggestions.
>
> I was thinking about a benchmark or something else to systematically
> evaluate and improve the configuration of storm, but I'm not sure if this
> is even worth the time.
>
> I think the more experienced of you definitely have further ideas!
>
> Thanks and regards
> Tobias
>

Mime
View raw message