flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Scheduling of GroupByKey and CombinePerKey operations
Date Thu, 18 Jan 2018 15:22:22 GMT
Hi Pawel,

This question might be better suited for the Beam user list.
Beam includes the Beam Flink runner which translates Beam programs into
Flink programs.

Best,
Fabian

2018-01-18 16:02 GMT+01:00 Pawel Bartoszek <pawelbartoszek89@gmail.com>:

> Can I ask why some operations run only one slot? I understand that file
> writes should happen only one one slot but GroupByKey operation could be
> distributed across all slots. I am having around 20k distinct keys every
> minute. Is there any way to break this operator chain?
>
> I noticed that CombinePerKey operations that don't have IO related
> transformation are scheduled across all 32 slots.
>
>
> My cluster has 32 slots across 2 task managers. Running Beam 2.2. and
> Flink 1.3.2
>
> 2018-01-18, 13:56:28 2018-01-18, 14:37:14 40m 45s GroupByKey ->
> ParMultiDo(WriteShardedBundles) -> ParMultiDo(Anonymous) ->
> xxx.pipeline.output.io.file.WriteWindowToFile-SumPlaybackBitrateResult2/
> TextIO.Write/WriteFiles/Reshuffle/Window.Into()/Window.Assign.out ->
> ParMultiDo(ReifyValueTimestamp) -> ToKeyedWorkItem 149 MB 333,672 70.8 MB
> 19 32
> 00320000
> RUNNING
>
> Start TimeEnd TimeDurationBytes receivedRecords receivedBytes sentRecords
> sentAttemptHostStatus
> 2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
> 2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
> 2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
> 2018-01-18, 13:56:28 40m 45s 77.5 MB 333,683 2.21 MB 20 1 xxx RUNNING
> 2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
> 2018-01-18, 13:56:28 40m 45s 2.30 MB 0 2.21 MB 0 1 xxx RUNNING
>
> Thanks,
> Pawel
>

Mime
View raw message