flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: DataStream.partitionCustom() - define parallelism
Date Wed, 20 Jul 2016 11:26:56 GMT
Hi,
I think that was just related to the DataSet API. If I'm not mistaken
changing the parallelism should work after a "partitionCustom()".

Cheers,
Aljoscha

On Tue, 19 Jul 2016 at 19:25 Jaromir Vanek <vanek.jaromir@gmail.com> wrote:

> Aljoscha Krettek-2 wrote
> > Hi,
> > it should be possible to set the parallelism on the actual downstream
> > operation. The partitioning operation is just an intermediate.
> >
> > Cheers,
> > Aljoscha
>
> Are you sure about that? The  mentioned discussion
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Range-partitioning-tp10947p10953.html
> >
> about range partitioner says exactly opposite:
>
>
> > A partitioning will only be valid to the point that you change the
> > parallelism.
> >
> > In the modified program the data will be correctly partitioned (lets say
> > into 8 partitions if the default parallelism is 8).
> > After the partitioning, the 8 partitions have to be reduced to 3
> > partitions as defined by the map-partition
> > operator with parallelism 3. This is done by randomly shuffling which
> > destroys the range-partitioning.
>
> It says the the downstream operation will use "random shuffle" when
> changing
> parallelism. Is 'customPartition()' different case?
>
>
>
> --
> View this message in context:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DataStream-partitionCustom-define-parallelism-tp12597p12617.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message