mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharma Podila <spod...@netflix.com>
Subject Re: Pitfalls when writing custom Frameworks
Date Mon, 01 Sep 2014 17:45:32 GMT
I am tempted to say that the short answer is, if your option B works, why
bother writing your own scheduler/framework?

Writing a Mesos framework can be easy. However, writing a fault tolerant
Mesos framework that has good scalability, is performant, and is highly
available can be relatively hard. Here's a few things, off the top of my
head, that helped us make the decision to write our own:

   - There must be a good long term reason to write your own framework. The
   scheduling/preemption/allocation model you spoke of may be a good reason.
   For us, it was specific scheduling optimizations that are not generic and
   are absent in other frameworks.
   - Fault tolerance is a combination of a few things, Here's a few to
   consider:
      - Task reconciliation with Mesos master currently will involve more
      than just using the reconcile feature. We augment it with heartbeats from
      tasks, Aurora does GC task, etc.. I believe it will take another Mesos
      release (or two?) before we can rely solely on Mesos task reconciliation.
      - Framework itself must be highly available, for example, using
      ZooKeeper leader election among multiple framework instances.
      - Fault tolerant persistence of task states. For example, when Mesos
      calls your framework with a status update of a task, that state must be
      reliably persisted.
   - It sounds like achieving fair share allocation via preemptions is
   important to you. That "external entity" you refer to may be non-trivial in
   the long run. If you were to embark on writing your own framework, another
   model to consider is to just have one framework scheduler instance for all
   users. Then, put the preemptions and fair share logic inside it. There
   could be complexities such as, for heterogeneous mix of task and slave
   resource sizes, scaling down an arbitrary number of tasks from user A
   doesn't imply they will benefit user B. The scheduler can perform this
   better than an external entity, by only preempting the right ones, etc.
      - That said, for simpler use cases, it may work just fine to have an
      external entity.
   - Scheduling itself is a hard problem. And can slow down quickly when
   doing anything more than first-fit style, by adding a few constraints and
   SLAs. Preemptions, for example, can slow down the scheduler in figuring out
   the right tasks to preempt to honor the fair share SLAs. That is, assuming
   you have more than a few hundred tasks.
   - There were a few talks at MesosCon, ten days ago, on this topic
   including one from us. The video/slides from the conference should be
   available from MesosCon sometime soon.





On Sun, Aug 31, 2014 at 7:51 AM, Stephan Erb <stephan@dev.static-void.de>
wrote:

> Hi everybody,
>
> I would like to assess the effort required to write a custom framework.
>
> Background: We have an application where we can start a flexible number
> of long-running worker processes performing number-crunching. The more
> processes the better. However, we have multiple users, each running an
> instance of the application and therefore competing for resources (as
> each tries to run as many worker processes as possible).
>
> For various reasons, we would like to run our application instances on
> top of mesos. There seem to be two ways to achieve this:
>
>      A. Write a custom framework for our application that spawns the
>         worker processes on demand. Each user gets to run one framework
>         instance. We also need preemption of workers to achieve equality
>         among frameworks. We could achieve this using an external entity
>         monitoring all frameworks and telling to worst offenders to
>         scale down a little.
>      B. Instead of writing a framework, use a Service-Scheduler like
>         Marathon, Aurora or Singularity to spawn the worker processes.
>         Instead of just performing the scale-down, the external entity
>         would dictate the number of worker processes for each
>         application depending on its demand.
>
>
> The first choice seems to be the natural fit for Mesos. However,
> existing framework like Aurora seem to be battle-tested in regard to
> high availability, race conditions and issues like state reconciliation
> where the world view of scheduler and slaves are drifting apart.
>
> So this question boils down to: When considering to write a custom
> framework, which pitfalls do I have to be aware of? Can I come away with
> blindly implementing the scheduler API? Or do I always have to implement
> stuff like custom state-reconciliation in order to prevent orphaned
> tasks on slaves (for example, when my framework scheduler crashes or is
> temporarily unavailable)?
>
> Thanks for your input!
>
> Best Regards,
> Stephan
>
>
>
>
>

Mime
View raw message