storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arnaud BOS <arnaud.t...@gmail.com>
Subject Parallelism, field grouping and windowing
Date Tue, 14 Nov 2017 15:51:09 GMT
Hi all,

Suppose I have a WindowedBolt subscribing to a KafkaSpout stream.
The WindowedBolt is a sliding window of size 2 with a sliding interval of 1.
The stream from KafkaSpout to WindowedBolt is partitioned by field (Field
Grouping).
To keep things simple let's assume the stream gets partitioned in two
groups only, A and B in the context of two supervisors with one worker
process each (setNumWorkers(2)):

By default the number of tasks of a spout/bolt is equal to the parallelism
hint, correct ?

If the WindowedBolt has a parallelism hint of 2 and 2 tasks (setNumTasks(2)
or not specified), one thread (executor) running exactly 1 task will
execute on each worker/supervisor, correct ?

Each executor/task will receive tuples from only one of the partitions, for
instance executor/task 1 gets tuples from substream A and executor/task 2
get tuples from substream B, correct ?

Assuming the latter is correct, if a node/supervisor fails, will the
remaining task begin to receive tuples from the other half of the stream
partition ? That is, if executor/task 2 (on supervisor 2) disappears, will
the tuples from A and B be interleaved in the sliding window of
executor/task 1 ? Or will the substream B just "hang" until another task is
started to handle it ?

All of this makes sense and my use case might look counterproductive but I
need to be sure of what's happening, my workflow is inherently sequential
and I don't want "streams" to interleave.

Thanks !

Mime
View raw message