I am executing a data stream application which uses rebalance. Basically I am counting words using "src -> split -> physicalPartitionStrategy -> keyBy -> sum -> print". I am running 3 examples, one without physical partition strategy, one with rebalance strategy , and one with partial partition strategy from .
I know that the keyBy operator actually kills what rebalance is doing because it splits the stream by key and if I have a stream with skewed key, one parallel instance of the operator after the keyBy will be overloaded. However, I was expecting that before the keyBy I would have a balanced stream, which is not happening.
Basically, I want to see the difference in records/sec between operators when I use rebalance or any other physical partition strategy. However, when I found no difference in the records/sec metrics of all operators when I am running 3 different physical partition strategies. Screenshots of Prometheus+Grafana are attached.
Maybe I am measuring the wrong operator, or maybe I am not using the rebalance in the right way, or I am not doing a good use case to test the rebalance transformation.
I am also testing a different physical partition to later try to implement the issue "FLINK-1725 New Partitioner for better load balancing for skewed data" . I am not sure, but I guess that all physical partition strategies have to be implemented on a KeyedStream.
DataStream<String> text = env.addSource(new WordSource());
// split lines in strings
DataStream<Tuple2<String, Integer>> tokenizer = text.flatMap(new Tokenizer());
// choose a partitioning strategy
DataStream<Tuple2<String, Integer>> partitionedStream = tokenizer);
DataStream<Tuple2<String, Integer>> partitionedStream = tokenizer.rebalance();
DataStream<Tuple2<String, Integer>> partitionedStream = tokenizer.partitionByPartial(0);
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez