flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Flink's multi-user support
Date Wed, 13 May 2015 09:41:00 GMT
On first thought, the sessions and the multi-job vs. job queue question are
almost two separate issues.

Can you add the sessions without removing the concurrent jobs we currently
have?

On Wed, May 13, 2015 at 10:34 AM, Maximilian Michels <mxm@apache.org> wrote:

> I think we can agree that real multi-user support in Flink (standalone) is
> neither desirable, because there are already sophisticated solutions out
> there (YARN or Mesos), nor feasible because it is a lot of work to get it
> right.
>
> At the current state of affairs, resource sharing between two users
> submitting a job at the same time, is not properly handled. However, this
> discussion showed that it is desirable to have support for submitting
> multiple job to a single Flink cluster. This could be realized using a
> simple queuing system in which jobs are executed one after another.
>
> In case of the soon to be supported resuming of jobs from intermediate
> results, this should still enable multiple clients to refer to past jobs.
> The job manager simply holds a list of old ExecutionGraphs for each user
> session. When the user ends the session or a timeout occurs, the
> corresponding graph is archived. This poses some sort of session
> management.
>
> tl;dr I propose to drop the multi-user support that we have now. Instead,
> let's have a one-job-at-a-time usage model with a queuing system and
> eventually a session management to deal with resuming from already
> materialized results.
>
> What do you think?
>
> On Thu, Apr 30, 2015 at 11:09 AM, Flavio Pompermaier <pompermaier@okkam.it
> >
> wrote:
>
> > There was an attempt to build such a queue during the Dopa project when
> > Flink was still Stratosphere.
> > Probably it could be a good idea to collect the good and bad things
> learned
> > from it to start designing the new scheduler :)
> >
> > On Thu, Apr 30, 2015 at 10:08 AM, Stephan Ewen <sewen@apache.org> wrote:
> >
> > > Most components are written multi-job aware.
> > >
> > > The only thing that is not in there right now is scheduling policies
> for
> > > fair resource sharing. This is important in shared clusters.
> > >
> > > Since YARN implements all those things (various job queues with
> different
> > > priorities/policies etc), I suggest to not try and re-build it in Flink
> > and
> > > simply declare a JobManager a "single-job-at-a-time" manager. You can
> > still
> > > run an interactive session with many jobs one after another.
> > >
> > >
> > > On Wed, Apr 29, 2015 at 7:07 PM, Maximilian Michels <mxm@apache.org>
> > > wrote:
> > >
> > > > >
> > > > > However, dropping it completely instead of improving it would make
> > > Flink
> > > > > setups on dedicated clusters quite useless, right?
> > > > >
> > > >
> > > > Not really, because you could also use YARN on dedicated clusters for
> > > > proper multi-user support.
> > > >
> > > > On Wed, Apr 29, 2015 at 5:51 PM, Fabian Hueske <fhueske@gmail.com>
> > > wrote:
> > > >
> > > > > I agree that Flink's multi-user support is not very good at the
> > moment.
> > > > > However, dropping it completely instead of improving it would make
> > > Flink
> > > > > setups on dedicated clusters quite useless, right?
> > > > >
> > > > >
> > > > > 2015-04-29 17:33 GMT+02:00 Maximilian Michels <mxm@apache.org>:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Currently Flink accepts jobs from multiple clients and executes
> > them
> > > > > > concurrently if the resource limits are not exceeded. However,
> the
> > > > > > multi-user support is very poor. We don't support queuing of
jobs
> > and
> > > > > > concurrent jobs have to share resources in a nice way. Otherwise,
> > > jobs
> > > > > will
> > > > > > fail.
> > > > > >
> > > > > > Using YARN, we circumvent these problems because it provides
a
> > proper
> > > > > user
> > > > > > and session management. I'm wondering now, should we get rid
of
> the
> > > > > pseudo
> > > > > > multi-user mode and just support one user per Flink cluster
> > instance?
> > > > > >
> > > > > > Best,
> > > > > > Max
> > > > > >
> > > > > > PS:
> > > > > > This question came up when I was working on a pull request to
> > support
> > > > > > backtracking intermediate results. I need to hold a copy of
the
> > full
> > > > > > previous execution graph to resume from old results. With
> multiple
> > > > users,
> > > > > > we have to build in some kind of session management to archive
> old
> > > > > > execution graphs. Otherwise, they will consume too much memory
in
> > the
> > > > job
> > > > > > manager.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message