aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Smith <yasumo...@gmail.com>
Subject Re: Thermos external component deprecation plan
Date Sat, 24 Jan 2015 16:43:22 GMT
Thanks for the write up!

> On Jan 22, 2015, at 13:27, Brian Wickman <wickman@apache.org> wrote:
> 
> Thermos is a standalone task execution system that is not coupled to Aurora
> or Mesos.  This is why by default, Thermos writes out of the sandbox
> (/var/run/thermos), has a separate observability system (Thermos observer),
> and CLI (thermos.)
> 
> Aurora built a Thermos executor as its default executor, but the scheduler
> is not architecturally tied to Thermos (or vice versa.)  In order to make
> things work smoothly with this decoupling, a Thermos-specific GC executor
> is also necessary to clean up the state leftover by the execution of
> Thermos tasks and reconcile potential conflicts between the state of the
> Mesos master and Aurora scheduler.
> 
> Both the GC executor and Thermos observer violate some of the philosophical
> axioms of Mesos (e.g. out-of-sandbox access.)  They also significantly
> increase the complexity of building, deploying and maintaining Aurora.  I'm
> proposing removing both of them as required Aurora components.
> 
> In order to do this and make Thermos/Aurora/Mesos to play together more
> nicely, several things are necessary.
> 
> 1) Moving /var/run/thermos for each task into the Mesos sandbox
> 
> Thermos is a state machine with all state transitions persisted to disk.
> Right now this goes to /var/run/thermos, but it should instead be persisted
> some place relative to the Mesos sandbox so that the Mesos slave can
> garbage collect this state once a Thermos task has completed.
> 
> This poses a task detection problem -- the Thermos CLI and Thermos observer
> rely upon the existence of /var/run/thermos to know what tasks are running,
> so we will need to develop a plugin to detect alternate task roots (see
> AURORA-1024 <https://issues.apache.org/jira/browse/AURORA-1024> AURORA-1025
> <https://issues.apache.org/jira/browse/AURORA-1025> AURORA-1026
> <https://issues.apache.org/jira/browse/AURORA-1026> AURORA-1027
> <https://issues.apache.org/jira/browse/AURORA-1025>).
> 
> 2) Making the Thermos executor responsible for the Thermos UI
> 
> In order to make the Thermos observer an optional component, the Thermos
> executor will need to assume Thermos observer responsibilities.  Since the
> Mesos slave already provides a webserver to serve executor sandboxes, I am
> proposing that the Thermos executor generates static HTML content that can
> be served by the Mesos slave as a UI.  This means that the executor can
> remain lean (no embedded webserver.)  See AURORA-725
> <https://issues.apache.org/jira/browse/AURORA-725> AURORA-777
> <https://issues.apache.org/jira/browse/AURORA-777>
> 
> 3) Making the Aurora scheduler responsible for state reconciliation
> 
> The last component that should be removed is the GC executor.  The GC
> executor performs the important task of state reconciliation, but this is
> now supported directly by the Mesos master.  See AURORA-715
> <https://issues.apache.org/jira/browse/AURORA-715> and specifically
> AURORA-1047 <https://issues.apache.org/jira/browse/AURORA-1047>.

Although the trusty gc_executor has been solid for a long time, removing it would definitely
simplify things, so +10.


> 
> Lastly, this work should make it much easier to support alternate executor
> implementations (including the Mesos default executor) from Aurora once a
> proper Aurora API (AURORA-987
> <https://issues.apache.org/jira/browse/AURORA-987>) is available.
> 
> ~brian

Mime
View raw message