flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Task Parallelism in a Cluster
Date Wed, 02 Dec 2015 08:39:31 GMT
If I'm not mistaken, then the scheduler has already a preference to spread
independent pipelines out across the cluster. At least he uses a queue of
instances from which it pops the first element if it allocates a new slot.
This instance is then appended to the queue again, if it has some resources
(slots) left.

I would assume that you have a shuffle operation involved in your job such
that it makes sense for the scheduler to deploy all pipelines to the same
machine.

Cheers,
Till
On Dec 1, 2015 4:01 PM, "Stephan Ewen" <sewen@apache.org> wrote:

> Slots are like "resource groups" which execute entire pipelines. They
> frequently have more than one operator.
>
> What you can try as a workaround is decrease the number of slots per
> machine to cause the operators to be spread across more machines.
>
> If this is a crucial issue for your use case, it should be simple to add a
> "preference to spread out" to the scheduler...
>
> On Tue, Dec 1, 2015 at 3:26 PM, Kashmar, Ali <Ali.Kashmar@emc.com> wrote:
>
> > Is there a way to make a task cluster-parallelizable? I.e. Make sure the
> > parallel instances of the task are distributed across the cluster. When I
> > run my flink job with a parallelism of 16, all the parallel tasks are
> > assigned to the first task manager.
> >
> > - Ali
> >
> > On 2015-11-30, 2:18 PM, "Ufuk Celebi" <uce@apache.org> wrote:
> >
> > >
> > >> On 30 Nov 2015, at 17:47, Kashmar, Ali <Ali.Kashmar@emc.com> wrote:
> > >> Do the parallel instances of each task get distributed across the
> > >>cluster or is it possible that they all run on the same node?
> > >
> > >Yes, slots are requested from all nodes of the cluster. But keep in mind
> > >that multiple tasks (forming a local pipeline) can be scheduled to the
> > >same slot (1 slot can hold many tasks).
> > >
> > >Have you seen this?
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/internals/job
> > >_scheduling.html
> > >
> > >> If they can all run on the same node, what happens when that node
> > >>crashes? Does the job manager recreate them using the remaining open
> > >>slots?
> > >
> > >What happens: The job manager tries to restart the program with the same
> > >parallelism. Thus if you have enough free slots available in your
> > >cluster, this works smoothly (so yes, the remaining/available slots are
> > >used)
> > >
> > >With a YARN cluster the task manager containers are restarted
> > >automatically. In standalone mode, you have to take care of this
> yourself.
> > >
> > >
> > >Does this help?
> > >
> > >­ Ufuk
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message