storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Dreissig <>
Subject Trident pipelining and transactional properties
Date Mon, 01 Feb 2016 19:29:49 GMT

I am currently trying to understand the parallelism properties and transactional semantics
offered by Trident and couldn’t find an answer to these two questions:

1. The „Trident Spouts“ documentation [1] says that „[b]y default, Trident processes
a single batch at a time, waiting for the batch to succeed or fail before trying another batch“.
But do Trident bolts always wait until a batch is completed, collect the results and then
pass them on to the next bolt(s) as complete batches? Without pipelining, this would mean
that only one bolt can be active at a time, effectively preventing any parallelism.
Or are tuples entering a stream and being delivered to the next bolt(s) as soon as they are
emitted? This would still introduce some idle time and increased latency without pipelining,
but at least seems like a better resource utilization.

2. The idea that states will only ever have to deal with a new batch or one from immediately
before (assuming transactional or opaque-transactional spouts) is at the core of Trident’s
state model.
On this topic, the docs section from above promises that even with pipelining, „Trident
will order any state updates taking place in the topology among batches“. Is this some special
guarantee for the built-in stateful operations (i.e. partitionPersist and persistentAggregate,
which afaics uses partitionPersist internally), or can all bolts assume that they’ll never
see any batch repeated except the latest one they processed?

I couldn’t find these questions covered in the docs or in previous discussions. So I tried
consulting the source code, but it’s not easily comprehensible with regard to such issues.
Any help would be highly appreciated.

Best regards,

View raw message