flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Not enough free slots to run the job
Date Mon, 21 Mar 2016 13:15:55 GMT
Hi Ovidiu,

right now the scheduler in Flink will not use more slots than requested.
To avoid issues on recovery, we usually recommend users to have some spare
slots (run job with p=15 on a cluster with slots=20). I agree that it would
make sense to add a flag which allows a job to grab more slots if they are
available. The problem with that is however, that jobs can currently not
change their parallelism. So if a job fails, it can not downscale to
restart on the remaining slots.
That's why the spare slots approach is currently the only way to go.

Regards,
Robert

On Fri, Mar 18, 2016 at 1:30 PM, Ovidiu-Cristian MARCU <
ovidiu-cristian.marcu@inria.fr> wrote:

> Hi,
>
> For the situation where a program specify a maximum parallelism (so it is
> supposed to use all available task slots) we can have the possibility that
> one of the task managers is not registered for various reasons.
> In this case the job will fail for not enough free slots to run the job.
>
> For me this means the scheduler has a limitation to work by statically
> assign tasks to the task slots the job is configured.
>
> Instead I would like to be able to specify a minimum parallelism of a job
> but also the possibility to dynamically use more task slots if additional
> task slots can be used.
> Another use case will be that if during the execution of a job we lose one
> node so some task slots, if the minimum parallelism is still ensured, the
> job should recover and continue its execution instead of just failing.
>
> Is it possible to make such changes?
>
> Best,
> Ovidiu

Mime
View raw message