reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Yang <johnya...@gmail.com>
Subject Re: Design Considerations on reef-1791
Date Wed, 14 Jun 2017 01:19:49 GMT
Hi Saikat,


Many thanks for working on the mesos runtime!
I can answer 4): Yes, we can do without the extra remote managers, but with
some caveats.

By default, Mesos employs pessimistic concurrency control
<https://research.google.com/pubs/pub41684.html> in giving out resource
offers.
So from our(REEF) perspective, once we get a resource offer from Mesos, I
believe the offer is pretty much for us to keep without any other job
taking it away from us.
With this in mind, the mesos runtime can do the following, which doesn't
really require any extra RemoteManagers.

   - Upon start: Be a good citizen and reject any incoming offers, since we
   don't need any resources yet
   - Upon resource request: Keep an appropriate offer
   - Upon resource launch: Simply launch a REEF evaluator with the offer

Let's call this Design A

However, the current mesos runtime implementation(let's call it Design B)
does not work like Design A.
The main reason is that custom allocators
<http://mesos.apache.org/documentation/latest/allocation-module/#writing-a-custom-allocator>
that
make offers to multiple jobs simultaneously can be used in Mesos.
So to make sure, Design B launches a Mesos task upon resource request, and
the task sets up a RemoteManager channel through which the REEF evaluator
is launched.

I must admit that had I known more about the pessimistic locking 3 years
ago when I wrote the mesos runtime, I would've thought about going with
Design A, which covers the common case much more nicely.
And then, I would've handled the behaviors of custom allocators as
exceptional cases through implementing the Scheduler#offerRescinded
callback, although I'm still not sure if it's straightforward to do so with
REEF.

All in all, I believe the mesos runtime hasn't really been maintained since
it was first written, and has bits that need to be refactored.
For example, I see that we're still using Mesos 0.25.0, when 1.2.0
<http://mesos.apache.org/> has been released.

Hope this helps.


Thanks,
John


On Wed, Jun 14, 2017 at 8:17 AM, Saikat Kanjilal <sxk1969@gmail.com> wrote:

> @Markus/Sergiy,
> I've spent the past few days or so studying the implementation of the
> reef-runtime-mesos and had some things I wanted to discuss, as I mentioned
> before I created reef-runtime-spark as a clone of the mesos runtime as a
> first step.  However the more I look at the code and try to figure out how
> to merge
> https://github.com/apache/reef/tree/master/lang/scala/
> reef-examples-scala/src/main/scala/org/apache/reef/examples/hellospark
> into reef-runtime-spark there are several things that come to mind needing
> further discussion:
>
> 1) the mesos runtime is currently using google protcol buffer and the mesos
> task API, am assuming we don't need any of this for the spark runtime or
> any of the interfaces with avro, is that assumption correct
> 2) I see a lot of classes in the org.apache.reef.runtime.mesos.driver
> package associated with Launching, Releasing,Requesting Resources, in the
> interim I renamed all these to Spark versions and am assuming we can still
> reuse these, do you see any issues with this, if we can reuse these they
> will be available through the SparkDriverConfiguration which extends
> ConfigurationModuleBuilder (again similar to Mesos implementation)
> 3) I also renamed all of the mesos evaluator packages to their spark
> counterparts, do you see any issues with reusing the evaluator parameters
> classes
> 4) Finally I am looking at the mesos util directory and I am wondering if
> we can do without any of the Remote management functionality (i..e
> MesosRemoteManager etc)
>
>
> Would love some input on this as I piece through the first implementation
> of the reef-runtime-spark.
> Regards
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message