From John Burwell <>
Subject Re: [DISCUSS/PROPOSAL] Upgrading Driver Model
Date Tue, 20 Aug 2013 22:31:11 GMT

Before we can dig into timelines or implementations, I think we need to get consensus on the
problem to solved and the goals.  Once we have a proper understanding of the scope, I believe
we can chunk the across a set of development lifecycle.  The subject is vast, but it also
has a far reaching impact to both the storage and network layer evolution efforts.  As such,
I believe we need to start addressing it as part of the next release.  

As a separate thread, we need to discuss the timeline for the next release.  I think we need
to avoid the time compression caused by the overlap of the 4.1 stabilization effort and 4.2
development.  Therefore, I don't think we should consider development of the next release
started until the first 4.2 RC is released.  I will try to open a separate discuss thread
for this topic, as well as, tying of the discussion of release code names.


On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <> wrote:

> Hey John,
> I think this is some great stuff. Thanks for the write up.
> It looks like you have ideas around what might go into a first release of
> this plug-in framework. Were you thinking we'd have enough time to squeeze
> that first rev into 4.3. I'm just wondering (it's not a huge deal to hit
> that release for this) because we would only have about five weeks.
> Thanks
> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <> wrote:
>> All,
>> In capturing my thoughts on storage, my thinking backed into the driver
>> model.  While we have the beginnings of such a model today, I see the
>> following deficiencies:
>>   1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>>   each have a slightly different model for allowing system functionality to
>>   be extended/substituted.  These differences increase the barrier of entry
>>   for vendors seeking to extend CloudStack and accrete code paths to be
>>   maintained and verified.
>>   2. *Leaky Abstraction*:  Plugins are registered through a Spring
>>   configuration file.  In addition to being operator unfriendly (most
>>   sysadmins are not Spring experts nor do they want to be), we expose the
>>   core bootstrapping mechanism to operators.  Therefore, a misconfiguration
>>   could negatively impact the injection/configuration of internal management
>>   server components.  Essentially handing them a loaded shotgun pointed at
>>   our right foot.
>>   3. *Nondeterministic Load/Unload Model*:  Because the core loading
>>   mechanism is Spring, the management has little control over the timing and
>>   order of component loading/unloading.  Changes to the Management Server's
>>   component dependency graph could break a driver by causing it to be started
>>   at an unexpected time.
>>   4. *Lack of Execution Isolation*: As a Spring component, plugins are
>>   loaded into the same execution context as core management server
>>   components.  Therefore, an errant plugin can corrupt the entire management
>>   server.
>> For next revision of the plugin/driver mechanism, I would like see us
>> migrate towards a standard pluggable driver model that supports all of the
>> management server's extension points (e.g. network devices, storage
>> devices, hypervisors, etc) with the following capabilities:
>>   - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>>   common state machine and categorization (e.g. network, storage, hypervisor,
>>   etc) that permits the deterministic calculation of initialization and
>>   destruction order (i.e. network layer drivers -> storage layer drivers ->
>>   hypervisor drivers).  Plugin inter-dependencies would be supported between
>>   plugins sharing the same category.
>>   - *In-process Installation and Upgrade*: Adding or upgrading a driver
>>   does not require the management server to be restarted.  This capability
>>   implies a system that supports the simultaneous execution of multiple
>>   driver versions and the ability to suspend continued execution work on a
>>   resource while the underlying driver instance is replaced.
>>   - *Execution Isolation*: The deployment packaging and execution
>>   environment supports different (and potentially conflicting) versions of
>>   dependencies to be simultaneously used.  Additionally, plugins would be
>>   sufficiently sandboxed to protect the management server against driver
>>   instability.
>>   - *Extension Data Model*: Drivers provide a property bag with a
>>   metadata descriptor to validate and render vendor specific data.  The
>>   contents of this property bag will provided to every driver operation
>>   invocation at runtime.  The metadata descriptor would be a lightweight
>>   description that provides a label resource key, a description resource key,
>>   data type (string, date, number, boolean), required flag, and optional
>>   length limit.
>>   - *Introspection: Administrative APIs/UIs allow operators to
>>   understand the configuration of the drivers in the system, their
>>   configuration, and their current state.*
>>   - *Discoverability*: Optionally, drivers can be discovered via a
>>   project repository definition (similar to Yum) allowing drivers to be
>>   remotely acquired and operators to be notified regarding update
>>   availability.  The project would also provide, free of charge, certificates
>>   to sign plugins.  This mechanism would support local mirroring to support
>>   air gapped management networks.
>> Fundamentally, I do not want to turn CloudStack into an erector set with
>> more screws than nuts which is a risk with highly pluggable architectures.
>> As such, I think we would need to tightly bound the scope of drivers and
>> their behaviors to prevent the loss system usability and stability.  My
>> thinking is that drivers would be packaged into a custom JAR, CAR
>> (CloudStack ARchive), that would be structured as followed:
>>   - META-INF
>>      - MANIFEST.MF
>>      - driver.yaml (driver metadata(e.g. version, name, description,
>>      etc) serialized in YAML format)
>>      - LICENSE (a text file containing the driver's license)
>>   - lib (driver dependencies)
>>   - classes (driver implementation)
>>   - resources (driver message files and potentially JS resources)
>> The management server would acquire drivers through a simple scan of a URL
>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, the
>> management server would create an execution environment (likely a dedicated
>> ExecutorService and Classloader), and transition the state of the driver to
>> Running (the exact state model would need to be worked out).  To be really
>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to
>> create CARs.   I can also imagine an opportunities to add hooks to this
>> model to register instrumentation information with JMX and authorization.
>> To keep the scope of this email confined, we would introduce the general
>> notion of a Resource, and (hand wave hand wave) eventually compartmentalize
>> the execution of work around a resource [1].  This (hand waved)
>> compartmentalization would allow us the controls necessary to safely and
>> reliably perform in-place driver upgrades.  For an initial release, I would
>> recommend implementing the abstractions, loading mechanism, extension data
>> model, and discovery features.  With these capabilities in place, we could
>> attack the in-place upgrade model.
>> If we were to adopt such a pluggable capability, we would have the
>> opportunity to decouple the vendor and CloudStack release schedules.  For
>> example, if a vendor were introducing a new product that required a new or
>> updated driver, they would no longer need to wait for a CloudStack release
>> to support it.  They would also gain the ability to fix high priority
>> defects in the same manner.
>> I have hand waved a number of issues that would need to be resolved before
>> such an approach could be implemented.  However, I think we need to decide,
>> as a community, that it worth devoting energy and effort to enhancing the
>> plugin/driver model and the goals of that effort before driving head first
>> into the deep rabbit hole of design/implementation.
>> Thoughts? (/me ducks)
>> -John
>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
