storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject Svend's blog - several questions
Date Wed, 05 Feb 2014 17:22:02 GMT
I've read Svend's blog [http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-storm/]
multiple times and I have a few questions.


"Because we did a groupBy on one tuple field, each List contains here one single
String: the correlationId. Note that the list we return must have exactly the same
size as the list of keys, so that Storm knows what period corresponds to what key.
So for any key that does not exist in DB, we simply put a null in the resulting list."

Q1: Do the db keys come only from groupBy?
Q2: Can you do groupBy multiple keys:like .groupBy("name").groupBy("id") ?
Q3: When we add null we keep the size of the results list the same as they keys list but I
don't understand how we make sure that key(3) points to correct result(3).
After all we're adding nulls at the end of result list not intermitently. ie: if
key(1) does not have an entry in db, and key size is 5, we add null to last position
in results not to results(1). This doesn't preserve consistency/order so key(1) now
gives result(1) which is not null as it should be. Is the code incorrect ... or the
explanation on Svend's blog is incorrect?


Moving on,
"Once this is loaded Storm will present the tuples having the same correlation ID
one by one to our reducer, the PeriodBuilder"

Q4: Does Trident/Storm call the reducer after calling multiGet and before calling multiPut?
Q5: What params (and their types) are passed to the reducer and what parameters should it
emit so they can go into multiGet?

Q6: The first time the program is run the database is empty and multiGet will return nothing.
Does the reducer need to take care and make sure to insert for the first time as opposed to
update value? I do see that reducer (TimelineUpdater) checks for nulls and I'm guessing this
is the reason why it does so.


Q7:
Can someone explain what these mean:
.each  (I've seen this used even consecutively: .each(..).each(..) )
.newStream
.newValuesStream
.persistAggregate

I am unable to find javadocs with documentation for the method signatures.
These java docs don't help much: http://nathanmarz.github.io/storm/doc/storm/trident/Stream.html


Q8:
Storm has ack/fail; does Trident handle that automatically?


Q9: Has anyone tried Spark? http://spark.incubator.apache.org/streaming/
I'm wondering if anyone has tried it because I'm thinking of ditching storm and moving to
that.
It seems much much much better documented.


Lots of questions I know. Thanks for reading!

-Adrian


Mime
View raw message