flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Proposal: Refactor distributed coordination to use the Akka Actor Library
Date Fri, 05 Sep 2014 15:10:16 GMT
+1


2014-09-05 6:46 GMT-07:00 Till Rohrmann <till.rohrmann@gmail.com>:

> +1
>
>
> On Fri, Sep 5, 2014 at 3:04 PM, Stephan Ewen <sewen@apache.org> wrote:
>
> > +1
> >
> >
> > On Fri, Sep 5, 2014 at 2:53 PM, Ufuk Celebi <uce@apache.org> wrote:
> >
> > > +1
> > >
> > >
> > > On Fri, Sep 5, 2014 at 2:25 PM, Kostas Tzoumas <ktzoumas@apache.org>
> > > wrote:
> > >
> > > > +1 for refactoring using Akka, the arguments are overwhelming.
> > > >
> > > >
> > > > On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <rmetzger@apache.org>
> > > > wrote:
> > > >
> > > > > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to
> > have
> > > a
> > > > > big community [1] and users [2] that depend on the system.
> > > > >
> > > > > The YARN client is also using the old RPC service. I would like to
> > > > rewrite
> > > > > it with Akka once we have added it into the other parts of the
> > system,
> > > to
> > > > > learn it.
> > > > >
> > > > >
> > > > > [1] https://github.com/akka/akka/pulls
> > > > > [2]
> > > > >
> > >
> http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <sewen@apache.org>
> > wrote:
> > > > >
> > > > > > This proposes to refactor the RPC service and the coordination
> > > between
> > > > > > Client, JobManager, and TaskManager to use the Akka actor
> library.
> > > > > >
> > > > > > Even though Akka is written in Scala, it offers a Java interface
> > and
> > > we
> > > > > can
> > > > > > use Akka completely from Java.
> > > > > >
> > > > > > Below are a list of arguments why this would help the system:
> > > > > >
> > > > > >
> > > > > > Problems with the current RPC service:
> > > > > > --------------------------------------------------------
> > > > > >
> > > > > >   - No asynchronous calls with callbacks. This is the reason
why
> > > > several
> > > > > > parts of the runtime poll the status, introducing unnecessary
> > > latency.
> > > > > >
> > > > > >   - No exception forwarding (many exceptions are simply
> swallowed),
> > > > > making
> > > > > > debugging and operation in flaky environments very hard
> > > > > >
> > > > > >   - Limited number of handler threads. The RPC can only handle
a
> > fix
> > > > > number
> > > > > > of concurrent requests, forcing you to maintain separate thread
> > pools
> > > > to
> > > > > > delegate actions to
> > > > > >
> > > > > >   - No support for primitive data types (or boxed primitives)
as
> > > > > arguments,
> > > > > > everything has to be a specially serializable type
> > > > > >
> > > > > >   - Problematic threading model. The RPC continuously spawns
and
> > > > > terminates
> > > > > > threads
> > > > > >
> > > > > >
> > > > > >
> > > > > > Benefits of switching to the Akka actor model:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------
> > > > > >
> > > > > >   - Akka solves all of the above issues out of the box
> > > > > >
> > > > > >   - The supervisor model allows you to do failure detection
of
> > > actors.
> > > > > That
> > > > > > provides a unified way of detecting and handling failures
> (missing
> > > > > > heartbeats, failed calls, ...)
> > > > > >
> > > > > >   - Akka has tools to make stateful actors persistent and restart
> > > them
> > > > on
> > > > > > other machines in cases of failure. That would greatly help
in
> > > > > implementing
> > > > > > "master fail-over", which will become important
> > > > > >
> > > > > >   - You can define many "call targets" (actors). Tasks (on
> > > > taskmanagers)
> > > > > > can directly call their ExecutionVertex on the JobManager, rather
> > > than
> > > > > > calling the JobManager, creating a Runnable that looks up the
> > > execution
> > > > > > vertex, and so on...
> > > > > >
> > > > > >   - The actor model's approach to queue actions on an actor
and
> run
> > > the
> > > > > one
> > > > > > after another makes the concurrency model of the state machine
> very
> > > > > simple
> > > > > > and robust
> > > > > >
> > > > > >   - We "outsource" our own concerns about maintaining and
> improving
> > > > that
> > > > > > part of the system
> > > > > >
> > > > > > Greetings,
> > > > > > Stephan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message