flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "m@xi" <makisnt...@gmail.com>
Subject Re: Maintain heavy hitters in Flink application
Date Wed, 31 Jan 2018 13:27:27 GMT
Hello everyone and Happy New Year!

Regarding the Heavy Hitter tracking...I wanna do it in a distributed manner. 

Thus,
1 -- Round Robin the input stream to a number of parallel map instances (say
p = env.parallelism)
2 -- Each one of the p mappers maintains approximately the HH of its
corresponding portion of the input, utilizing an algorithm like Space
Saving, Misha-Gries etc etc.
3 -- Every now and then I would like to concatenate the state of all the p
mappers into one, thus producing the global Space Saving summary for the
entire input stream.
4 -- Due to the fact that I wanna balance out things given to the p mappers
in the beginning, I wanna use rebalance(), i.e. round robin algorithm -->
Thus, its is not possible to use Keyed State.
5 -- So, I am going to use ListCheckpointed state as described in [1].
6 -- When the "every now and then" happens, I wanna merge the partial
summaries and I will emit them through a side output, as described in [2].

The question is the following: [1] shows an example of state-redistribution.
So...can I change the parallelism of the p instance parallel .map() from
within the operator, and merge the summaries for the HH there just before
emitting them to the side output???

Essentially, how should I implement the 6th bullet is my question.

Any advice, on it or on the general guideline implementation for getting the
aforementioned thing done, is more than welcome.

Cheer,
Max

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/side_output.html



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Mime
View raw message