flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaromir Vanek <vanek.jaro...@gmail.com>
Subject Re: DataStream.partitionCustom() - define parallelism
Date Tue, 19 Jul 2016 09:50:35 GMT
Aljoscha Krettek-2 wrote
> Hi,
> it should be possible to set the parallelism on the actual downstream
> operation. The partitioning operation is just an intermediate.
> Cheers,
> Aljoscha

Are you sure about that? The  mentioned discussion
about range partitioner says exactly opposite:

> A partitioning will only be valid to the point that you change the
> parallelism.
> In the modified program the data will be correctly partitioned (lets say
> into 8 partitions if the default parallelism is 8).
> After the partitioning, the 8 partitions have to be reduced to 3
> partitions as defined by the map-partition
> operator with parallelism 3. This is done by randomly shuffling which
> destroys the range-partitioning.

It says the the downstream operation will use "random shuffle" when changing
parallelism. Is 'customPartition()' different case?

View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DataStream-partitionCustom-define-parallelism-tp12597p12617.html
Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.

View raw message