cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Burwell <jburw...@basho.com>
Subject [DISCUSS/PROPOSAL] Upgrading Driver Model
Date Tue, 20 Aug 2013 21:43:17 GMT
All,

In capturing my thoughts on storage, my thinking backed into the driver model.  While we have
the beginnings of such a model today, I see the following deficiencies:

Multiple Models: The Storage, Hypervisor, and Security layers each have a slightly different
model for allowing system functionality to be extended/substituted.  These differences increase
the barrier of entry for vendors seeking to extend CloudStack and accrete code paths to be
maintained and verified.
Leaky Abstraction:  Plugins are registered through a Spring configuration file.  In addition
to being operator unfriendly (most sysadmins are not Spring experts nor do they want to be),
we expose the core bootstrapping mechanism to operators.  Therefore, a misconfiguration could
negatively impact the injection/configuration of internal management server components.  Essentially
handing them a loaded shotgun pointed at our right foot.
Nondeterministic Load/Unload Model:  Because the core loading mechanism is Spring, the management
has little control over the timing and order of component loading/unloading.  Changes to the
Management Server's component dependency graph could break a driver by causing it to be started
at an unexpected time.
Lack of Execution Isolation: As a Spring component, plugins are loaded into the same execution
context as core management server components.  Therefore, an errant plugin can corrupt the
entire management server.  

For next revision of the plugin/driver mechanism, I would like see us migrate towards a standard
pluggable driver model that supports all of the management server's extension points (e.g.
network devices, storage devices, hypervisors, etc) with the following capabilities:

Consolidated Lifecycle and Startup Procedure:  Drivers share a common state machine and categorization
(e.g. network, storage, hypervisor, etc) that permits the deterministic calculation of initialization
and destruction order (i.e. network layer drivers -> storage layer drivers -> hypervisor
drivers).  Plugin inter-dependencies would be supported between plugins sharing the same category.
In-process Installation and Upgrade: Adding or upgrading a driver does not require the management
server to be restarted.  This capability implies a system that supports the simultaneous execution
of multiple driver versions and the ability to suspend continued execution work on a resource
while the underlying driver instance is replaced.
Execution Isolation: The deployment packaging and execution environment supports different
(and potentially conflicting) versions of dependencies to be simultaneously used.  Additionally,
plugins would be sufficiently sandboxed to protect the management server against driver instability.

Extension Data Model: Drivers provide a property bag with a metadata descriptor to validate
and render vendor specific data.  The contents of this property bag will provided to every
driver operation invocation at runtime.  The metadata descriptor would be a lightweight description
that provides a label resource key, a description resource key, data type (string, date, number,
boolean), required flag, and optional length limit.
Introspection: Administrative APIs/UIs allow operators to understand the configuration of
the drivers in the system, their configuration, and their current state.
Discoverability: Optionally, drivers can be discovered via a project repository definition
(similar to Yum) allowing drivers to be remotely acquired and operators to be notified regarding
update availability.  The project would also provide, free of charge, certificates to sign
plugins.  This mechanism would support local mirroring to support air gapped management networks.

Fundamentally, I do not want to turn CloudStack into an erector set with more screws than
nuts which is a risk with highly pluggable architectures.  As such, I think we would need
to tightly bound the scope of drivers and their behaviors to prevent the loss system usability
and stability.  My thinking is that drivers would be packaged into a custom JAR, CAR (CloudStack
ARchive), that would be structured as followed:

META-INF
MANIFEST.MF
driver.yaml (driver metadata(e.g. version, name, description, etc) serialized in YAML format)
LICENSE (a text file containing the driver's license)
lib (driver dependencies)
classes (driver implementation)
resources (driver message files and potentially JS resources)

The management server would acquire drivers through a simple scan of a URL (e.g. file directory,
S3 bucket, etc).  For every CAR object found, the management server would create an execution
environment (likely a dedicated ExecutorService and Classloader), and transition the state
of the driver to Running (the exact state model would need to be worked out).  To be really
nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to create CARs.   I can
also imagine an opportunities to add hooks to this model to register instrumentation information
with JMX and authorization.

To keep the scope of this email confined, we would introduce the general notion of a Resource,
and (hand wave hand wave) eventually compartmentalize the execution of work around a resource
[1].  This (hand waved) compartmentalization would allow us the controls necessary to safely
and reliably perform in-place driver upgrades.  For an initial release, I would recommend
implementing the abstractions, loading mechanism, extension data model, and discovery features.
 With these capabilities in place, we could attack the in-place upgrade model.

If we were to adopt such a pluggable capability, we would have the opportunity to decouple
the vendor and CloudStack release schedules.  For example, if a vendor were introducing a
new product that required a new or updated driver, they would no longer need to wait for a
CloudStack release to support it.  They would also gain the ability to fix high priority defects
in the same manner. 

I have hand waved a number of issues that would need to be resolved before such an approach
could be implemented.  However, I think we need to decide, as a community, that it worth devoting
energy and effort to enhancing the plugin/driver model and the goals of that effort before
driving head first into the deep rabbit hole of design/implementation.  

Thoughts? (/me ducks)
-John

[1]: My opinions on the matter from CloudStack Collab 2013 -> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management

Mime
View raw message