flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Getting the NumberOfParallelSubtask
Date Tue, 21 Jun 2016 10:11:57 GMT
Hi Robert,

the number of parallel subtasks is the parallelism of the job or the
individual operator. Only when executing Flink locally, the parallelism is
set to the CPU cores.
The number of groups generated by the groupBy() transformation doesn't
affect the parallelism. Very often the number of groups is much higher than
the parallelism, in those cases, each parallel instance will process
multiple groups.

If you want to know the parallelism of your operators globally, you'll need
to set it manually (say all operators to a parallelism of 8).

Regards,
Robert


On Mon, Jun 20, 2016 at 10:00 PM, Chesnay Schepler <chesnay@apache.org>
wrote:

> Within the mapper you cannot access the parallelism of the following nor
> preceding operation.
>
>
> On 20.06.2016 15:56, Paschek, Robert wrote:
>
>> Hi Mailing list,
>>
>> using a RichMapPartitionFunction i can access the total number m of this
>> mapper utilized in my job with
>> int m = getRuntimeContext().getNumberOfParallelSubtasks();
>>
>> I think that would be - in general - the total number of CPU Cores used
>> by Apache Flink among the cluster.
>>
>> Is there a way to access the number of the following reducer?
>>
>> In general i would assume that the number of the following reducers
>> depends on the number of groups generated by the groupBy() transformation.
>> So the number of the reducer r would be 1 <= r <= m.
>>
>> My Job:
>> DataSet<?> output = input
>>                                 .mapPartition(new MR_GPMRS_Mapper())
>>                                 .groupBy(0)
>>                                 .reduceGroup(new MR_GPMRS_Reducer());
>>
>> Thank you in advance
>> Robert
>>
>
>

Mime
View raw message