flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Peng <jerry.boyang.p...@gmail.com>
Subject Re: Question about parallelism
Date Fri, 18 Aug 2017 21:12:14 GMT
I guess my previous question is also asking if the parallelism is set
for the operator or "data stream".  Is there implied repartitioning
when the parallelism changes?

On Fri, Aug 18, 2017 at 2:08 PM, Jerry Peng <jerry.boyang.peng@gmail.com> wrote:
> Hello all,
>
> I have a question about parallelism and partitioning in the
> DataStreams API.  In Flink, a user can the parallelism of a data
> source as well as operators.  So when I set the parallelism of a data
> source e.g.
>
> DataStream<String> text =
> env.readTextFile(params.get("input")).setParallelism(5)
>
> does this mean that the resulting "text" DataStream in going to be
> partitioned into 5 partitions or does it mean that there are going to
> be 5 parallel tasks that are going to run for this stage?
>
> If the next operator is:
>
> DataStream<Tuple2<String, Integer>> counts = text.flatMap(new
> Tokenizer()).setParallelism(10)
>
> and the parallelism is set to 10.  Are there 10 parallel tasks
> consuming from the 5 partitions? and how is the resulting "counts"
> DataStream partitioned? into 10 partitions?
>
> Thanks in advance!
>
> Best,
>
> Jerry

Mime
View raw message