flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Congxian Qiu <qcx978132...@gmail.com>
Subject Re: Partitioning key range
Date Sat, 06 Apr 2019 09:27:04 GMT
Hi Davood
Maybe a custom KeySelector can be helpful, you can define the key used to partition the stream.
You can ref the code[1] for detail.

[1] https://github.com/apache/flink/blob/8d05e91945c6c8d83f9924c00890ccf350f1f36f/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/partitioner/KeyGroupStreamPartitioner.java#L58

Best, Congxian
On Apr 5, 2019, 06:35 +0800, Davood Rafiei <rafieidavood9@gmail.com>, wrote:
> Hi all,
> I partition DataStream (say dsA) with parallelism 2 and get KeyedStream (say ksA) with
parallelism 2.
> Depending on my keys in dsA, one partition remains empty in ksA.
> For example when my keys are 10 and 20 in dsA, then both partitions in ksA are full.
> However, with keys 1000 and 1001, only one partition receives all of the upstream data
in ksA.
> Is there any way to get information about key ranges for each downstream partitions?
> Or is there any way to overcome this issue?
> We can assume that I know all possible keys (in this case 2 different keys) in dsA and
therefore I want all partitions in ksA to be fully utilized.
> Thanks,
> Davood

View raw message