flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: Replacing JobManager with Scala implementation
Date Wed, 03 Sep 2014 16:27:38 GMT
Thanks @Ufuk for the response.

Yeah, Akka hides all the low level nuts and bolts about the RPC flow
but then it also makes a bit harder to debug issues when communication
fail.
It makes sense to use one RPC framework if we could, and since there
are other plans for Akka in the code to help manage concurrencies
programming it is good idea to use Akka for RPC.

- Henry


On Wed, Sep 3, 2014 at 5:06 AM, Ufuk Celebi <uce@apache.org> wrote:
> Hey Till,
>
> I'm not sure what the "right" ASF process is, but I wouldn't mind a vote on
> this in order to make sure that you don't do unnecessary work by replacing
> the code with Scala.
>
> I for one would be certainly open to it. The only thing that bothers me is
> the current state of out-of-the-box IDE support. But since there are other
> successful Scala projects around ;-), which manage to do it, why shouldn't
> we?
>
> @Henry, regarding Akka: I think the main motiviation for moving to Akka
> (besides the points raised by Stephan and others) is that we actually don't
> want to bother with low-level thread management, protocols, etc.
>
>
>
> On Tue, Sep 2, 2014 at 8:32 PM, Henry Saputra <henry.saputra@gmail.com>
> wrote:
>
>> HI Till,
>>
>> Thanks for opening the discussions and lead the effort and apologize
>> for late response.
>>
>> From what I have gathered so far, there are 2 issues:
>> 1. Introducing Akka as RPC
>> 2. Moving to Scala to enable easy access to Akka Scala APIs.
>>
>> For no1, if the RPC us used for lower level communications then we
>> could probably consider Netty as the transport and serialization
>> protocol (I also have added comment to the JIRA).
>> Internally, to reduce thread management we could use Akka via Scala
>> bridge service to make sure we use Scala Akka APIs.
>>
>> So addressing no 2, we could mix both Scala and Java in JobManager and
>> TaskManager. the code that handle async RPC communications between JM
>> and TM are using Java via Netty, and internal multi-threads or higher
>> level plane code such as heart beat we could use Akka.
>>
>> It does introduce a bit mix between Java and Scala code but we already
>> have mix of Scala and Java to support APIs so I think we could move
>> some the internal code to use Scala too as "learning" steps to utilize
>> Scala for better multi concurrency/ functional programming.
>>
>> - Henry
>>
>>
>>
>> On Sun, Aug 31, 2014 at 4:31 AM, Till Rohrmann <till.rohrmann@gmail.com>
>> wrote:
>> > Hi Daniel,
>> >
>> > the RPC rework is discussed in
>> > https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently down
>> > due to maintenance reasons.
>> >
>> > The ideas to use akka are the following. Akka allows us to reduce the
>> code
>> > base which has to be maintained. Especially, we get rid of all the
>> > multi-threading programming of the rpc service which is always hard to
>> work
>> > with. With Akka we would get the heartbeat signal for free, because Akka
>> > can detect dead actors. Akka uses supervision to handle fault tolerance
>> as
>> > well as recovery and it allows an easy forwarding of remote exceptions.
>> At
>> > the same time it offers a nice rpc abstraction which easily allows to
>> > implement asynchronous services. Furthermore, it scales rather well to
>> > large numbers of nodes and hopefully we get the latencies of Flink a
>> little
>> > bit down.
>> >
>> > Bests,
>> >
>> > Till
>> >
>> >
>> > On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <warneke@apache.org>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> will akka just be used for RPC or are there any plans to expand the
>> >> actor-based model to further parts of the runtime system? If so, could
>> you
>> >> please point me to the discussion thread?
>> >>
>> >> Spontaneously, I would say that adding a hard dependency on Scala just
>> for
>> >> the sake of having a hip RPC service sounds like a pretty dodgy deal.
>> >> Therefore, I would like understand how much value akka could bring to
>> Flink
>> >> in the long run. The discussion whether to reimplement core components
>> of
>> >> the system in Scala should be the second step in my opinion.
>> >>
>> >> Bests,
>> >>
>> >>     Daniel
>> >>
>> >>
>> >> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
>> >>
>> >>  I agree that using Akka's actors from Java results in very ugly code.
>> >>> Hiding the internals of Akka behind Java reflection looks better but
>> >>> breaks
>> >>> the principles of actors. For me it is kind of a deal breaker for using
>> >>> Akka from Java.  I think that Till has more reasons to believe that
>> Scala
>> >>> would be a more appropriate for building a new Job/Task Manager.
>> >>>
>> >>> I think that this discussion should focus on 4 main aspects:
>> >>> 1. Performance
>> >>> 2. Implementability
>> >>> 3. Maintainability
>> >>> 4. Available Tools
>> >>>
>> >>> 1. Performance: Since that the job of the JobManager and the
>> TaskManager
>> >>> is
>> >>> to 1) exchange messages in order to maintain a distributed state
>> machine
>> >>> and 2) setup connections between task managers, 3) detect failures
>> etc..
>> >>> In
>> >>> these basic operations, performance should not be an issue. Akka was
>> >>> proven
>> >>> to scale quite well with very low latency. I guess that the low level
>> >>> "plumbing" (serialization, connections, etc.) will continue in Java
in
>> >>> order to guarantee high performance. I have no clue on what's happening
>> >>> with memory management and whether this will be implemented in Java
or
>> >>> Scala and the respective consequences. Please comment.
>> >>>
>> >>> 2. Since the Job/Task Manager is going to be essentially implemented
>> from
>> >>> scratch, given the power of Akka, it seems to me that the
>> implementation
>> >>> will be   easier, shorter and less verbose in Scala, given that Till
is
>> >>> comfortable enough with Scala.
>> >>>
>> >>> 3. Given #2, maintaining the code and trying out new ideas in Scala
>> would
>> >>> take less time and effort. But maintaining low level plumbing in Java
>> and
>> >>> high level logic in Scala scares me. Anyone that has done this before
>> >>> could
>> >>> comment on this?
>> >>>
>> >>> 4. Tools: Robert has raised some issues already but I think that tools
>> >>> will
>> >>> get better with time.
>> >>>
>> >>> Given the above, I would focus on #3 to be honest. Apart from this,
>> going
>> >>> the Scala way sounds like a great idea. I really second Kostas' opinion
>> >>> that if large changes are going to happen, this is the best moment.
>> >>>
>> >>> Cheers,
>> >>> Asterios
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
>> till.rohrmann@gmail.com>
>> >>> wrote:
>> >>>
>> >>>  I also agree with Robert and Kostas that it has to be a community
>> >>>> decision.
>> >>>> I understand the problems with Eclipse and the Scala IDE which is
a
>> pain
>> >>>> in
>> >>>> the ass. But eventually these things will be fixed. Maybe we could
>> also
>> >>>> talk to the typesafe guy and tell him that this problem bothers
us a
>> lot.
>> >>>>
>> >>>> I also believe that the project is not about a specific programming
>> >>>> language but a problem we want to tackle with Flink. From time to
>> time it
>> >>>> might be necessary to adapt the tools in order to reach the goal.
In
>> >>>> fact,
>> >>>> I don't believe that Scala parts would drive people away from the
>> >>>> project.
>> >>>> Instead, Scala enthusiasts would be motivated to join us.
>> >>>>
>> >>>> Actually I stumbled across a quote of Leibniz which put's my point
of
>> >>>> view
>> >>>> quite accurately in a nutshell:
>> >>>>
>> >>>> In symbols one observes an advantage in discovery which is greatest
>> when
>> >>>> they express the exact nature of a thing briefly and, as it were,
>> picture
>> >>>> it; then indeed the labor of thought is wonderfully diminished --
>> >>>> Gottfried
>> >>>> Wilhelm Leibniz
>> >>>>
>> >>>>
>> >>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <ktzoumas@apache.org
>> >
>> >>>> wrote:
>> >>>>
>> >>>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
>> rmetzger@apache.org>
>> >>>>> wrote:
>> >>>>>
>> >>>>>  Changing the programming language of a very important system
>> component
>> >>>>>>
>> >>>>> is
>> >>>>
>> >>>>> something we should carefully discuss.
>> >>>>>>
>> >>>>>>  Definitely agree, I think the community should discuss
this very
>> >>>>>
>> >>>> carefully.
>> >>>>
>> >>>>>
>> >>>>>  I understand that Akka is written in Scala and that it will
be much
>> >>>>>>
>> >>>>> more
>> >>>>
>> >>>>> natural to implement the actor based system using Scala.
>> >>>>>> I see the following issues that we should consider:
>> >>>>>> Until now, Flink is clearly a project implemented only in
Java. The
>> >>>>>>
>> >>>>> Scala
>> >>>>
>> >>>>> API basically sits on top of the Java-based runtime. We do not
really
>> >>>>>> depend on Scala (we could easily remove the Scala API if
we want
>> to).
>> >>>>>> Having code written in Scala in the main system will add
a hard
>> >>>>>>
>> >>>>> dependency
>> >>>>>
>> >>>>>> on a scala version.
>> >>>>>> Being a pure Java project has some advantages: I think its
a fact
>> that
>> >>>>>> there are more Java programmers than Scala programmers.
So our
>> chances
>> >>>>>>
>> >>>>> of
>> >>>>
>> >>>>> attracting new contributors are higher when being a Java project.
>> >>>>>> On the other hand, we could maybe attract Scala developers
to our
>> >>>>>>
>> >>>>> project.
>> >>>>>
>> >>>>>> But that has not happened (for contributors, not users!)
so far for
>> our
>> >>>>>> Scala API, so I don't see any reason for that to happen.
>> >>>>>>
>> >>>>>>
>> >>>>>>  This is definitely an issue to consider. We need to carefully
>> weight
>> >>>>> how
>> >>>>> important this issue is. If we want to break things, incubation
is
>> the
>> >>>>> right time to do it. Below are some arguments in favor of breaking
>> >>>>>
>> >>>> things,
>> >>>>
>> >>>>> but do keep in mind that I am undecided, and I would really
like to
>> see
>> >>>>>
>> >>>> the
>> >>>>
>> >>>>> community weighing in.
>> >>>>>
>> >>>>> First, I would dare say that the primary reason for someone
to
>> >>>>> contribute
>> >>>>> to Flink so far has not been that the code is written in Java,
but
>> more
>> >>>>>
>> >>>> the
>> >>>>
>> >>>>> content and nature of the project. Most contributors are Big
Data
>> >>>>> enthusiasts in some way or another.
>> >>>>>
>> >>>>> Second, Scala projects have attracted contributors in the past.
>> >>>>>
>> >>>>> Third, it should not be too hard for someone that does not know
>> Scala to
>> >>>>> contribute to a different component if the interfaces are clear.
>> >>>>>
>> >>>>>
>> >>>>>  Another issue is tooling: There are a lot of problems with
Scala and
>> >>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems
to be
>> >>>>>>
>> >>>>> impossible
>> >>>>>
>> >>>>>> to compile Scala code with Luna because ScalaIDE does not
properly
>> cope
>> >>>>>> with it.
>> >>>>>> Even with Eclipse versions that are supported by ScalaIDE,
you have
>> to
>> >>>>>> manually install 3 plugins, some of them are not available
in the
>> >>>>>>
>> >>>>> Eclipse
>> >>>>
>> >>>>> Marketplace. So with a JobManager written in Scala, users can
not
>> just
>> >>>>>> import our project as a Maven project into Eclipse and start
>> >>>>>>
>> >>>>> developing.
>> >>>>
>> >>>>> The support for Maven is probably also limited. For example,
I don't
>> >>>>>>
>> >>>>> know
>> >>>>
>> >>>>> if there is a checkstyle plugin for Scala.
>> >>>>>>
>> >>>>>> I'm looking forward to hearing other opinions on this issue.
As I
>> said
>> >>>>>>
>> >>>>> in
>> >>>>
>> >>>>> the beginning, we should exchange arguments on this and think
about
>> it
>> >>>>>>
>> >>>>> for
>> >>>>>
>> >>>>>> some time before we decide on this.
>> >>>>>>
>> >>>>>>  Best,
>> >>>>>
>> >>>>>> Robert
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
>> trohrmann@apache.org>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>  Hi guys,
>> >>>>>>>
>> >>>>>>> I currently working on replacing the old rpc infrastructure
with an
>> >>>>>>>
>> >>>>>> akka
>> >>>>>
>> >>>>>> based actor system. In the wake of this change I will reimplement
>> the
>> >>>>>>> JobManager and TaskManager which will then be actors.
Akka offers a
>> >>>>>>>
>> >>>>>> Java
>> >>>>>
>> >>>>>> API but the implementation turns out to be very verbose
and
>> >>>>>>>
>> >>>>>> laborious,
>> >>>>
>> >>>>> because Java 6 and 7 do not support lambdas and pattern matching.
>> >>>>>>>
>> >>>>>> Using
>> >>>>
>> >>>>> Scala instead, would allow a far more succinct and clear
>> >>>>>>>
>> >>>>>> implementation
>> >>>>
>> >>>>> of
>> >>>>>>
>> >>>>>>> the JobManager and TaskManager. Instead of a lot of
if statements
>> >>>>>>>
>> >>>>>> using
>> >>>>
>> >>>>> instanceof to figure out the message type, we could simply use
>> >>>>>>>
>> >>>>>> pattern
>> >>>>
>> >>>>> matching. Furthermore, the callback functions could simply be
Scala's
>> >>>>>>> anonymous functions. Therefore I would propose to use
Scala for
>> these
>> >>>>>>>
>> >>>>>> two
>> >>>>>
>> >>>>>> systems.
>> >>>>>>>
>> >>>>>>> The Akka system uses the slf4j library as logging interface.
>> >>>>>>>
>> >>>>>> Therefore
>> >>>>
>> >>>>> I
>> >>>>>
>> >>>>>> would also propose to replace the jcl logging system with
the slf4j
>> >>>>>>>
>> >>>>>> logging
>> >>>>>>
>> >>>>>>> system. Since we want to use Akka in many parts of the
runtime
>> system
>> >>>>>>>
>> >>>>>> and
>> >>>>>
>> >>>>>> it recommends using logback as logging backend, I would
also like to
>> >>>>>>> replace log4j with logback. But this change should inflict
only few
>> >>>>>>>
>> >>>>>> changes
>> >>>>>>
>> >>>>>>> once we established the slf4j logging interface everywhere.
>> >>>>>>>
>> >>>>>>> What do you guys think of that idea?
>> >>>>>>>
>> >>>>>>> Best regards,
>> >>>>>>>
>> >>>>>>> Till
>> >>>>>>>
>> >>>>>>>
>> >>
>>

Mime
View raw message