flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Katsipoulakis, Nikolaos Romanos" <kat...@cs.pitt.edu>
Subject RE: Custom Partitioning and windowing questions/concerns
Date Tue, 24 Jan 2017 16:18:51 GMT
Hello Fabian,

First, I would like to thank you for your suggestion and the additional information on determinism
and partition policies. As I mentioned on my initial email, I am new to Flink and every additional
piece of advice makes my “learning curve” less steep. In addition, I am aware that you
(and everyone else who follows this thread) might wonder why am I following this unconventional
path of performance partitioning, but, I have to inform you that my use-case’s goal is of
academic nature.

Turning to your suggestion, I took some time and go over version’s 1.2-SNAPSHOT code, and
I read the online documentation on the Process Function API which I found at: https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/process_function.html
. From my understanding, the process() transformation can be applied only on a KeyedStream<T>
and not on a DataStream<T>. Therefore, if I wanted to use a custom partition algorithm,
I would have to first make a call to partitionCustom() (DataStream<T> -> DataStream<T>),
followed by a keyBy(…) (DataStream<T> -> KeyedStream<T>), and finally apply
my first pre-aggregation step (i.e., call to process()). Concretely, my code would turn to
something like the following:
// Phase 1: parallel partial sum, with a parallelism of N (N > 1)
DataStream<Tuple3<Long, Integer, Integer> phaseOne = stream
                .partitionCustom(new CustomPartitioner(...)) // or .rebalance() or .shuffle()
                .process(new CustomProcessFunction(..., Time.seconds(10),...))

Unfortunately, you can understand that the above would be problematic for two reasons: First,
a call to keyBy() defeats the purpose of a custom partitioner, because stream will be (ultimately)
partitioned based on the keys and not on my CustomPartitioner.selectChannels() method. Second,
using process() does not solve my problem, because the issue with my use-case is to avoid
calling keyBy(). If I could do that, then I might as well call window()and not use the process
API in the first place. To be more precise, if I could use a KeyedStream<T>, then I
could do the following:

// Phase 1: parallel partial sum, with a parallelism of N (N > 1)
DataStream<Tuple3<Long, Integer, Integer> phaseOne = stream
                .partitionCustom(new CustomPartitioner(...))

Therefore, I don’t think using a Process Function would solve my problem. Am I understanding
your suggestion correctly? If yes, I would be grateful if you could explain to me in more
detail. On top of that, after reading my initial email again, I believe that the intentions
for my use-case were not quite clear. Please, do not hesitate to ask me for any clarifications.

Again, thank you very much for your interest and your time.

Kind Regards,

Nikos R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

From: Fabian Hueske [mailto:fhueske@gmail.com]
Sent: Tuesday, January 24, 2017 5:15 AM
To: user@flink.apache.org
Subject: Re: Custom Partitioning and windowing questions/concerns

Hi Nikos,
Flink's windows require a KeyedStream because they use the keys to manage their internal state
(each in-progress window has some state that needs to be persisted and checkpointed).
Moreover, Flink's event-time window operators return a deterministic result. In your use-case,
the result of the pre-aggregation (phase 1) should not deterministic because it would depend
on the partitioning of the input.
I would suggest to implement the pre-aggregation not with a window but with a ProcessFunction
(available in Flink 1.2-SNAPSHOT which will be release soon).
ProcessFunction allows you to register timers which can be used to emit results every 10 seconds.
Hope this helps,

2017-01-23 17:50 GMT+01:00 Katsipoulakis, Nikolaos Romanos <katsip@cs.pitt.edu<mailto:katsip@cs.pitt.edu>>:
Hello all,

Currently, I examine the effects of stream partitioning on performance for simple state-full

My toy application for the rest of my question will be the following: A stream of non-negative
integers, each one annotated with a timestamp, and the goal is to get the top-10 most frequent
non-negative integers on tumbling windows of 10 seconds. In other words, my input is a stream
of tuples with two fields, Tuple2<Long, Integer>(timestamp, key), where key is the non-negative
integer value, and timestamp is used to assign each event to a window. The execution plan
I am considering is to have a first phase (Phase 1), where the stream is partitioned and the
partial aggregations are processed in parallel (set parallelism to N > 1). Afterwards,
the second phase (Phase 2) involves gathering all partial aggregations on a single node (set
parallelism to 1), and calculate the full aggregation for each key, order the keys based on
windowed frequency and outputs the top-10 keys for each window.

As I mentioned earlier, my goal is to compare the performance of different partitioning policies
on this toy application. Initially, I want to compare shuffle-grouping (round-robin) and hash-grouping
and then move on to different partitioning policies by using Flink’s CustomPartitioner API.
After reading Flink’s documentation, I managed to develop the toy application using hash-partitioning.
Below, I present the different parts of my code:

// Phase 0: input setup
DataStream<Tuple3<Long, Integer, Integer>> stream = env.fromCollection(…)
               .assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple2<Long,
Integer>>() {
                    public long extractAscendingTimestamp(Tuple2<Long, Integer> event)
{ return event.f0; }
                }).map( (Tuple2<Long, Integer> e) -> new Tuple3<Long, Integer,
Integer>(e.f0, e.f1, 1));

On Phase 0, I collect the input stream, from an in-memory list, define the event timestamp
which will be used for windowing, and extend each event with a value of 1 for calculating
the appearance of each number on every window. Afterwards, for the parallel Phase 1, I use
hash partitioning by first using .keyBy() operation on the key of each tuple (i.e., field
1), followed by a .window() operation, to assign each tuple on a different window, and end
with a .sum(). My code for (parallel) Phase 1 is the following:

// Phase 1: parallel partial sum, with a parallelism of N (N > 1)
DataStream<Tuple3<Long, Integer, Integer> phaseOne = stream.keyBy(1).window(TumblingEventTimeWindows.of(Time.seconds(10)).sum(2).setParallelism(N);

Moving on to Phase 2, to aggregate all partial results of a single window in one operator
for producing the full aggregation, ordering based on frequency, and return the top-10 keys,
I have the following:

// Phase 2: serial full aggregation and ordering, with a parallelism of 1
DataStream<String> phaseTwo = phaseOne
                .apply(new AllWindowsFunction<Tuple3<Long, Integer, Integer>, String,
TimeWindow>() {
                    public void apply(TimeWindow window, Iterable<Tuple3<Long, Integer,
Integer>> values, Collector<String> out) throws Exception {
                        List<Integer> topTenValues = ...;
                        StringBuilder strBuilder = new StringBuilder();
                        for (Integer t : topTenValues)
                            strBuilder.append(Integer.toString(t) + “,”);

The previous code makes use of hash-partitioning for its parallel phase. From what I understand,
Flink allows the .window() operation only on a KeyedStream. Furthermore, the .customPartition()
method transforms a DataStream to a DataStream (and the same is true for .shuffle() which
round-robins events). Therefore, I am confused on how I can use a shuffle policy with windows.
One Idea that came to me is to provide an irrelevant field on the .keyBy() method, or define
my own KeySelector<IN, KEY> that will simulate shuffle grouping through key generation.
Unfortunately, I have two concerns regarding the previous alternatives: For the keyBy() approach,
I need to control the internal hashing mechanisms, which entails cherry-picking fields on
different workloads and performing an exhaustive search on the behavior of different random
fields (not practical). For the KeySelector<IN, KEY>approach, I need to maintain state
among different calls of getKey(), which (as far as I know) is not offered by the KeySelector<IN,
KEY> interface and I do not want to rely on external state that will lead to additional
overhead. Therefore, my first question is how will I be able to effectively use round-robin
grouping with windows on my toy application?

The bigger point I am trying to address revolves around custom partitioning policies and windows
in general. My understanding is that the benefit of a custom partitioning policy is to have
the ability to control the partitioning process based on a pre-defined set of resources (e.g.,
partitions, task slots etc.). Hence, I am confused on how I would be able to use partitionCustom()
followed by .window() on the (parallel) phase one, to test the performance of different execution
plans (i.e., partitioning policies).

I apologize for the long question, but I believe that I had to provide enough details for
the points/questions I currently have (highlighted with bold). Thank you very much for your

Kind Regards,

Nikos R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

View raw message