Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cloudstack.apache.org
Received-SPF: pass (nike.apache.org: domain of jburwell@basho.com designates
 209.85.216.171 as permitted sender)
Content-Type: multipart/signed;
 boundary="Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: [DISCUSS/PROPOSAL] Upgrading Driver Model
From: John Burwell <jburwell@basho.com>
In-Reply-To: 
 <CAGQtxvYikvyotv6-qLr=nzuNPOiujP_eOrN-Ew1pSzzSmA3eQw@mail.gmail.com>
Date: Wed, 21 Aug 2013 03:00:15 -0400
Cc: Darren Shepherd <darren.s.shepherd@gmail.com>,
 Hugo Trippaers <htrippaers@schubergphilis.com>,
 "La Motta, David" <David.LaMotta@netapp.com>
Message-Id: <9B40F0CE-0F16-41E5-B605-37770AD75C05@basho.com>
References: <80AC8E03-0EF4-4032-95C2-69273512357D@basho.com>
 <CACOnxCg0et2KrPrBYmgj4tPUySDck7=RWKkroN-=kEQWrUiqgQ@mail.gmail.com>
 <707F0358-E016-4BEE-8072-AAC62EAE9108@basho.com>
 <CACOnxCg9UR6vLcp4NPwrWuSyeeyhrLhg2RhMA373st2FNmAzbQ@mail.gmail.com>
 <9E3C0FDB-B60D-419C-8702-ED3923094111@gmail.com>
 <CA72FC7F-A112-4EBF-B782-B77428708139@basho.com>
 <0247E3A0-E19E-45B4-9548-C1DE97A438A0@gmail.com>
 <CAGQtxvYikvyotv6-qLr=nzuNPOiujP_eOrN-Ew1pSzzSmA3eQw@mail.gmail.com>
To: dev@cloudstack.apache.org

--Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104"


--Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

Daan,

I have the following issues with OSGi:=20

Complexity:  Building OSGi components adds a tremendous amount of =
complexity to both the building drivers and debugging runtime issues.  =
Additionally, OSGi has a much broader feature set than I think =
CloudStack needs to support.  Therefore, driver authors may use the =
feature set in unanticipated way that create system instability.
Dependency Hell: OSGi requires 3rd party dependencies to be packaged as =
OSGi bundles.  In practice, many third party libraries either have =
issues that prevent them from being bundles or their OSGi bundled =
versions are behind mainline release.

As an additionally personal experience, I do not want to re-create the =
mess that is Eclipse (i.e. an erector set with more screws than nuts).  =
In addition to its lack of reliability, it is incredibly difficult to =
comprehend how the component configurations and relationships are =
composed at runtime.

To be clear, I am not interested in creating a general purpose =
component/plugin model.  Fundamentally, we need a simple, purpose-built =
component model focused on providing stability and reliability through =
deterministic behavior rather than feature flexibility.  Unfortunately, =
both OSGi and Spring's focus on flexibility the later make them =
ill-suited for our purposes.

Thanks,
-John

On Aug 21, 2013, at 2:31 AM, Daan Hoogland <daan.hoogland@gmail.com> =
wrote:

> John,
>=20
> Nice work.
> Given the maturity of OSGi, I'd say lets see how it fits. One criteria
> would be can we limit the bundles that may be loaded based on what
> Cloudstack supports (and not allow loading pydev) if not we need to
> bake our own.
>=20
> But though I think your work is valuable I disagree on designing our
> CARs from the get go without having explored usable options in the
> field first. A new type of YARs is not what the world or cloudstack
> needs. And given what you have written the main problem wll be finding
> a framework we can restrict to what we want, not one that can do all
> of it.
>=20
> done shooting,
> Daan
>=20
> On Wed, Aug 21, 2013 at 2:52 AM, Darren Shepherd
> <darren.s.shepherd@gmail.com> wrote:
>> Sure, I fully understand how it theoretically works, but I'm saying =
from a
>> practical perspective it always seems to fall apart.  What your =
describing
>> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful =
framework that
>> allows you to expose services that can be dynamically updated at =
runtime.
>>=20
>> The issues always happens with unloading.  I'll give you a real world
>> example.  As part of the servlet spec your supposed to be able to =
stop and
>> unload wars.  But in practice if you do it enough times you typically =
run
>> out of memory.  So one such issue was with commons logging (since =
fixed).
>> When you do getLogger(myclass.class) it would cache a reference of =
the Class
>> object to the actual log impl.  The commons logging jar is typically =
loaded
>> with a system classloader and but MyClass.class would be loaded in =
the
>> webapp classloader.  So when you stop the war there is a reference =
chain
>> system classloader -> logfactory -> Myclass -> webapp classloader.  =
So the
>> web app never gets GC'd.
>>=20
>> So just pointing out the practical issues, that's it.
>>=20
>> Darren
>>=20
>> On Aug 20, 2013, at 5:31 PM, John Burwell <jburwell@basho.com> wrote:
>>=20
>> Darren,
>>=20
>> Actually, loading and unloading aren't difficult if resource =
management and
>> drivers work within the following constraints/assumptions:
>>=20
>> Drivers are transient and stateless
>> A driver instance is assigned per resource managed (i.e. no =
singletons)
>> A lightweight thread and mailbox (i.e. actor model) are assigned per
>> resource managed (outlined in the presentation referenced below)
>>=20
>>=20
>> Based on these constraints and assumptions, the following upgrade =
process
>> could be implemented:
>>=20
>> Load and verify new driver version to make it available
>> Notify the supervisor processes of each affected resource that a new =
driver
>> is available
>> Upon completion of the current message being processed by its =
associated
>> actor, the supervisor kills and respawns the actor managing its =
associated
>> resource
>> As part of startup, the supervisor injects an instance of the new =
driver
>> version and the actor resumes processing messages in its mailbox
>>=20
>>=20
>> This process mirrors the process that would occur on management =
server
>> startup for each resource minus killing an existing actor instance.
>> Eventually, the system will upgrade the driver without loss of =
operation.
>> More sophisticated policies could be added, but I think this approach =
would
>> be a solid default upgrade behavior.  As a bonus, this same approach =
could
>> also be applied to global configuration settings -- allowing the =
system to
>> apply changes to these values without restarting the system.
>>=20
>> In summary, CloudStack and Eclipse are very different types of =
systems.
>> Eclipse is a desktop application implementing complex workflows, user
>> interactions, and management of shared state (e.g. project structure, =
AST,
>> compiler status, etc).  In contrast, CloudStack is an eventually =
consistent
>> distributed system performing automation control.  As such, its =
requirements
>> plugin requirements are not only very different, but IMHO, much =
simpler.
>>=20
>> Thanks,
>> -John
>>=20
>> On Aug 20, 2013, at 7:44 PM, Darren Shepherd =
<darren.s.shepherd@gmail.com>
>> wrote:
>>=20
>> I know this isn't terribly useful, but I've been drawing a lot of =
squares
>> and circles and lines that connect those squares and circles lately =
and I
>> have a lot of architectural ideas for CloudStack.  At the rate I'm =
going it
>> will take me about two weeks to put together a discussion/proposal =
for the
>> community.  What I'm thinking is a superset of what you've listed out =
and
>> should align with your idea of a CAR.  The focus has a a lot to do =
with
>> modularity and extensibility.
>>=20
>> So more to come soon....  I will say one thing though, is with java =
you end
>> up having a hard time doing dynamic load and unloading of modules.  =
There's
>> plenty of frameworks that try really hard to do this right, like =
OSGI, but
>> its darn near impossible to do it right because of class loading and =
GC
>> issues (and that's why Eclipse has you restart after installing plugs =
even
>> though it is OSGi).
>>=20
>> I do believe that CloudStack should be possible of zero downtime =
maintenance
>> and have ideas around that, but at the end of the day, for plenty of
>> practical reasons, you still need a JVM restart if modules change.
>>=20
>> Darren
>>=20
>> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski =
<mike.tutkowski@solidfire.com>
>> wrote:
>>=20
>> I agree, John - let's get consensus first, then talk time tables.
>>=20
>>=20
>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jburwell@basho.com> =
wrote:
>>=20
>> Mike,
>>=20
>> Before we can dig into timelines or implementations, I think we need =
to
>> get consensus on the problem to solved and the goals.  Once we have a
>> proper understanding of the scope, I believe we can chunk the across =
a set
>> of development lifecycle.  The subject is vast, but it also has a far
>> reaching impact to both the storage and network layer evolution =
efforts.
>> As such, I believe we need to start addressing it as part of the next
>> release.
>>=20
>> As a separate thread, we need to discuss the timeline for the next
>> release.  I think we need to avoid the time compression caused by the
>> overlap of the 4.1 stabilization effort and 4.2 development.  =
Therefore, I
>> don't think we should consider development of the next release =
started
>> until the first 4.2 RC is released.  I will try to open a separate =
discuss
>> thread for this topic, as well as, tying of the discussion of release =
code
>> names.
>>=20
>> Thanks,
>> -John
>>=20
>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski =
<mike.tutkowski@solidfire.com>
>> wrote:
>>=20
>> Hey John,
>>=20
>> I think this is some great stuff. Thanks for the write up.
>>=20
>> It looks like you have ideas around what might go into a first =
release of
>> this plug-in framework. Were you thinking we'd have enough time to
>>=20
>> squeeze
>>=20
>> that first rev into 4.3. I'm just wondering (it's not a huge deal to =
hit
>> that release for this) because we would only have about five weeks.
>>=20
>> Thanks
>>=20
>>=20
>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jburwell@basho.com>
>>=20
>> wrote:
>>=20
>>=20
>> All,
>>=20
>> In capturing my thoughts on storage, my thinking backed into the =
driver
>> model.  While we have the beginnings of such a model today, I see the
>> following deficiencies:
>>=20
>>=20
>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
>> each have a slightly different model for allowing system
>>=20
>> functionality to
>>=20
>> be extended/substituted.  These differences increase the barrier of
>>=20
>> entry
>>=20
>> for vendors seeking to extend CloudStack and accrete code paths to be
>> maintained and verified.
>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
>> configuration file.  In addition to being operator unfriendly (most
>> sysadmins are not Spring experts nor do they want to be), we expose
>>=20
>> the
>>=20
>> core bootstrapping mechanism to operators.  Therefore, a
>>=20
>> misconfiguration
>>=20
>> could negatively impact the injection/configuration of internal
>>=20
>> management
>>=20
>> server components.  Essentially handing them a loaded shotgun pointed
>>=20
>> at
>>=20
>> our right foot.
>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
>> mechanism is Spring, the management has little control over the
>>=20
>> timing and
>>=20
>> order of component loading/unloading.  Changes to the Management
>>=20
>> Server's
>>=20
>> component dependency graph could break a driver by causing it to be
>>=20
>> started
>>=20
>> at an unexpected time.
>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
>> loaded into the same execution context as core management server
>> components.  Therefore, an errant plugin can corrupt the entire
>>=20
>> management
>>=20
>> server.
>>=20
>>=20
>> For next revision of the plugin/driver mechanism, I would like see us
>> migrate towards a standard pluggable driver model that supports all =
of
>>=20
>> the
>>=20
>> management server's extension points (e.g. network devices, storage
>> devices, hypervisors, etc) with the following capabilities:
>>=20
>>=20
>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
>> common state machine and categorization (e.g. network, storage,
>>=20
>> hypervisor,
>>=20
>> etc) that permits the deterministic calculation of initialization and
>> destruction order (i.e. network layer drivers -> storage layer
>>=20
>> drivers ->
>>=20
>> hypervisor drivers).  Plugin inter-dependencies would be supported
>>=20
>> between
>>=20
>> plugins sharing the same category.
>> - *In-process Installation and Upgrade*: Adding or upgrading a driver
>> does not require the management server to be restarted.  This
>>=20
>> capability
>>=20
>> implies a system that supports the simultaneous execution of multiple
>> driver versions and the ability to suspend continued execution work
>>=20
>> on a
>>=20
>> resource while the underlying driver instance is replaced.
>> - *Execution Isolation*: The deployment packaging and execution
>> environment supports different (and potentially conflicting) versions
>>=20
>> of
>>=20
>> dependencies to be simultaneously used.  Additionally, plugins would
>>=20
>> be
>>=20
>> sufficiently sandboxed to protect the management server against =
driver
>> instability.
>> - *Extension Data Model*: Drivers provide a property bag with a
>> metadata descriptor to validate and render vendor specific data.  The
>> contents of this property bag will provided to every driver operation
>> invocation at runtime.  The metadata descriptor would be a =
lightweight
>> description that provides a label resource key, a description
>>=20
>> resource key,
>>=20
>> data type (string, date, number, boolean), required flag, and =
optional
>> length limit.
>> - *Introspection: Administrative APIs/UIs allow operators to
>> understand the configuration of the drivers in the system, their
>> configuration, and their current state.*
>> - *Discoverability*: Optionally, drivers can be discovered via a
>> project repository definition (similar to Yum) allowing drivers to be
>> remotely acquired and operators to be notified regarding update
>> availability.  The project would also provide, free of charge,
>>=20
>> certificates
>>=20
>> to sign plugins.  This mechanism would support local mirroring to
>>=20
>> support
>>=20
>> air gapped management networks.
>>=20
>>=20
>> Fundamentally, I do not want to turn CloudStack into an erector set =
with
>> more screws than nuts which is a risk with highly pluggable
>>=20
>> architectures.
>>=20
>> As such, I think we would need to tightly bound the scope of drivers =
and
>> their behaviors to prevent the loss system usability and stability.  =
My
>> thinking is that drivers would be packaged into a custom JAR, CAR
>> (CloudStack ARchive), that would be structured as followed:
>>=20
>>=20
>> - META-INF
>>   - MANIFEST.MF
>>   - driver.yaml (driver metadata(e.g. version, name, description,
>>   etc) serialized in YAML format)
>>   - LICENSE (a text file containing the driver's license)
>> - lib (driver dependencies)
>> - classes (driver implementation)
>> - resources (driver message files and potentially JS resources)
>>=20
>>=20
>> The management server would acquire drivers through a simple scan of =
a
>>=20
>> URL
>>=20
>> (e.g. file directory, S3 bucket, etc).  For every CAR object found, =
the
>> management server would create an execution environment (likely a
>>=20
>> dedicated
>>=20
>> ExecutorService and Classloader), and transition the state of the
>>=20
>> driver to
>>=20
>> Running (the exact state model would need to be worked out).  To be
>>=20
>> really
>>=20
>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin =
to
>> create CARs.   I can also imagine an opportunities to add hooks to =
this
>> model to register instrumentation information with JMX and
>>=20
>> authorization.
>>=20
>>=20
>> To keep the scope of this email confined, we would introduce the =
general
>> notion of a Resource, and (hand wave hand wave) eventually
>>=20
>> compartmentalize
>>=20
>> the execution of work around a resource [1].  This (hand waved)
>> compartmentalization would allow us the controls necessary to safely =
and
>> reliably perform in-place driver upgrades.  For an initial release, I
>>=20
>> would
>>=20
>> recommend implementing the abstractions, loading mechanism, extension
>>=20
>> data
>>=20
>> model, and discovery features.  With these capabilities in place, we
>>=20
>> could
>>=20
>> attack the in-place upgrade model.
>>=20
>> If we were to adopt such a pluggable capability, we would have the
>> opportunity to decouple the vendor and CloudStack release schedules.
>>=20
>> For
>>=20
>> example, if a vendor were introducing a new product that required a =
new
>>=20
>> or
>>=20
>> updated driver, they would no longer need to wait for a CloudStack
>>=20
>> release
>>=20
>> to support it.  They would also gain the ability to fix high priority
>> defects in the same manner.
>>=20
>> I have hand waved a number of issues that would need to be resolved
>>=20
>> before
>>=20
>> such an approach could be implemented.  However, I think we need to
>>=20
>> decide,
>>=20
>> as a community, that it worth devoting energy and effort to enhancing
>>=20
>> the
>>=20
>> plugin/driver model and the goals of that effort before driving head
>>=20
>> first
>>=20
>> into the deep rabbit hole of design/implementation.
>>=20
>> Thoughts? (/me ducks)
>> -John
>>=20
>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
>>=20
>> =
http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stac=
k-distributed-process-management
>>=20
>>=20
>>=20
>>=20
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=3Dplay>
>> *=99*
>>=20
>>=20
>>=20
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the
>> cloud<http://solidfire.com/solution/overview/?video=3Dplay>
>> *=99*
>>=20
>>=20


--Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
">Daan,<div><br></div><div>I have the following issues with =
OSGi:&nbsp;</div><div><br></div><div><ul =
class=3D"MailOutline"><li>Complexity: &nbsp;Building OSGi components =
adds a tremendous amount of complexity to both the building drivers and =
debugging runtime issues. &nbsp;Additionally, OSGi has a much broader =
feature set than I think CloudStack needs to support. &nbsp;Therefore, =
driver authors may use the feature set in unanticipated way that create =
system instability.</li><li>Dependency Hell: OSGi requires 3rd party =
dependencies to be packaged as OSGi bundles. &nbsp;In practice, many =
third party libraries either have issues that prevent them from being =
bundles or their OSGi bundled versions are behind mainline =
release.</li></ul></div><div><br></div><div>As an additionally personal =
experience, I do not want to re-create the mess that is Eclipse (i.e. an =
erector set with more screws than nuts). &nbsp;In addition to its lack =
of reliability, it is incredibly difficult to comprehend how the =
component configurations and relationships are composed at =
runtime.</div><div><br></div><div>To be clear, I am not interested in =
creating a general purpose component/plugin model. &nbsp;Fundamentally, =
we need a simple, purpose-built component model focused on providing =
stability and reliability through deterministic behavior rather than =
feature flexibility. &nbsp;Unfortunately, both OSGi and Spring's focus =
on flexibility the later make them ill-suited for our =
purposes.</div><div><br></div><div>Thanks,</div><div>-John</div><div><br><=
div><div>On Aug 21, 2013, at 2:31 AM, Daan Hoogland &lt;<a =
href=3D"mailto:daan.hoogland@gmail.com">daan.hoogland@gmail.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">John,<br><br>Nice work.<br>Given the maturity of OSGi, I'd =
say lets see how it fits. One criteria<br>would be can we limit the =
bundles that may be loaded based on what<br>Cloudstack supports (and not =
allow loading pydev) if not we need to<br>bake our own.<br><br>But =
though I think your work is valuable I disagree on designing our<br>CARs =
from the get go without having explored usable options in the<br>field =
first. A new type of YARs is not what the world or cloudstack<br>needs. =
And given what you have written the main problem wll be finding<br>a =
framework we can restrict to what we want, not one that can do all<br>of =
it.<br><br>done shooting,<br>Daan<br><br>On Wed, Aug 21, 2013 at 2:52 =
AM, Darren Shepherd<br>&lt;<a =
href=3D"mailto:darren.s.shepherd@gmail.com">darren.s.shepherd@gmail.com</a=
>&gt; wrote:<br><blockquote type=3D"cite">Sure, I fully understand how =
it theoretically works, but I'm saying from a<br>practical perspective =
it always seems to fall apart. &nbsp;What your describing<br>is done =
excellently in OSGI 4.2 Blueprint. &nbsp;It's a beautiful framework =
that<br>allows you to expose services that can be dynamically updated at =
runtime.<br><br>The issues always happens with unloading. &nbsp;I'll =
give you a real world<br>example. &nbsp;As part of the servlet spec your =
supposed to be able to stop and<br>unload wars. &nbsp;But in practice if =
you do it enough times you typically run<br>out of memory. &nbsp;So one =
such issue was with commons logging (since fixed).<br>When you do =
getLogger(myclass.class) it would cache a reference of the =
Class<br>object to the actual log impl. &nbsp;The commons logging jar is =
typically loaded<br>with a system classloader and but MyClass.class =
would be loaded in the<br>webapp classloader. &nbsp;So when you stop the =
war there is a reference chain<br>system classloader -&gt; logfactory =
-&gt; Myclass -&gt; webapp classloader. &nbsp;So the<br>web app never =
gets GC'd.<br><br>So just pointing out the practical issues, that's =
it.<br><br>Darren<br><br>On Aug 20, 2013, at 5:31 PM, John Burwell =
&lt;<a href=3D"mailto:jburwell@basho.com">jburwell@basho.com</a>&gt; =
wrote:<br><br>Darren,<br><br>Actually, loading and unloading aren't =
difficult if resource management and<br>drivers work within the =
following constraints/assumptions:<br><br>Drivers are transient and =
stateless<br>A driver instance is assigned per resource managed (i.e. no =
singletons)<br>A lightweight thread and mailbox (i.e. actor model) are =
assigned per<br>resource managed (outlined in the presentation =
referenced below)<br><br><br>Based on these constraints and assumptions, =
the following upgrade process<br>could be implemented:<br><br>Load and =
verify new driver version to make it available<br>Notify the supervisor =
processes of each affected resource that a new driver<br>is =
available<br>Upon completion of the current message being processed by =
its associated<br>actor, the supervisor kills and respawns the actor =
managing its associated<br>resource<br>As part of startup, the =
supervisor injects an instance of the new driver<br>version and the =
actor resumes processing messages in its mailbox<br><br><br>This process =
mirrors the process that would occur on management server<br>startup for =
each resource minus killing an existing actor instance.<br>Eventually, =
the system will upgrade the driver without loss of operation.<br>More =
sophisticated policies could be added, but I think this approach =
would<br>be a solid default upgrade behavior. &nbsp;As a bonus, this =
same approach could<br>also be applied to global configuration settings =
-- allowing the system to<br>apply changes to these values without =
restarting the system.<br><br>In summary, CloudStack and Eclipse are =
very different types of systems.<br>Eclipse is a desktop application =
implementing complex workflows, user<br>interactions, and management of =
shared state (e.g. project structure, AST,<br>compiler status, etc). =
&nbsp;In contrast, CloudStack is an eventually consistent<br>distributed =
system performing automation control. &nbsp;As such, its =
requirements<br>plugin requirements are not only very different, but =
IMHO, much simpler.<br><br>Thanks,<br>-John<br><br>On Aug 20, 2013, at =
7:44 PM, Darren Shepherd &lt;<a =
href=3D"mailto:darren.s.shepherd@gmail.com">darren.s.shepherd@gmail.com</a=
>&gt;<br>wrote:<br><br>I know this isn't terribly useful, but I've been =
drawing a lot of squares<br>and circles and lines that connect those =
squares and circles lately and I<br>have a lot of architectural ideas =
for CloudStack. &nbsp;At the rate I'm going it<br>will take me about two =
weeks to put together a discussion/proposal for the<br>community. =
&nbsp;What I'm thinking is a superset of what you've listed out =
and<br>should align with your idea of a CAR. &nbsp;The focus has a a lot =
to do with<br>modularity and extensibility.<br><br>So more to come =
soon.... &nbsp;I will say one thing though, is with java you end<br>up =
having a hard time doing dynamic load and unloading of modules. =
&nbsp;There's<br>plenty of frameworks that try really hard to do this =
right, like OSGI, but<br>its darn near impossible to do it right because =
of class loading and GC<br>issues (and that's why Eclipse has you =
restart after installing plugs even<br>though it is OSGi).<br><br>I do =
believe that CloudStack should be possible of zero downtime =
maintenance<br>and have ideas around that, but at the end of the day, =
for plenty of<br>practical reasons, you still need a JVM restart if =
modules change.<br><br>Darren<br><br>On Aug 20, 2013, at 3:39 PM, Mike =
Tutkowski &lt;<a =
href=3D"mailto:mike.tutkowski@solidfire.com">mike.tutkowski@solidfire.com<=
/a>&gt;<br>wrote:<br><br>I agree, John - let's get consensus first, then =
talk time tables.<br><br><br>On Tue, Aug 20, 2013 at 4:31 PM, John =
Burwell &lt;<a =
href=3D"mailto:jburwell@basho.com">jburwell@basho.com</a>&gt; =
wrote:<br><br>Mike,<br><br>Before we can dig into timelines or =
implementations, I think we need to<br>get consensus on the problem to =
solved and the goals. &nbsp;Once we have a<br>proper understanding of =
the scope, I believe we can chunk the across a set<br>of development =
lifecycle. &nbsp;The subject is vast, but it also has a far<br>reaching =
impact to both the storage and network layer evolution efforts.<br>As =
such, I believe we need to start addressing it as part of the =
next<br>release.<br><br>As a separate thread, we need to discuss the =
timeline for the next<br>release. &nbsp;I think we need to avoid the =
time compression caused by the<br>overlap of the 4.1 stabilization =
effort and 4.2 development. &nbsp;Therefore, I<br>don't think we should =
consider development of the next release started<br>until the first 4.2 =
RC is released. &nbsp;I will try to open a separate discuss<br>thread =
for this topic, as well as, tying of the discussion of release =
code<br>names.<br><br>Thanks,<br>-John<br><br>On Aug 20, 2013, at 6:22 =
PM, Mike Tutkowski &lt;<a =
href=3D"mailto:mike.tutkowski@solidfire.com">mike.tutkowski@solidfire.com<=
/a>&gt;<br>wrote:<br><br>Hey John,<br><br>I think this is some great =
stuff. Thanks for the write up.<br><br>It looks like you have ideas =
around what might go into a first release of<br>this plug-in framework. =
Were you thinking we'd have enough time to<br><br>squeeze<br><br>that =
first rev into 4.3. I'm just wondering (it's not a huge deal to =
hit<br>that release for this) because we would only have about five =
weeks.<br><br>Thanks<br><br><br>On Tue, Aug 20, 2013 at 3:43 PM, John =
Burwell &lt;<a =
href=3D"mailto:jburwell@basho.com">jburwell@basho.com</a>&gt;<br><br>wrote=
:<br><br><br>All,<br><br>In capturing my thoughts on storage, my =
thinking backed into the driver<br>model. &nbsp;While we have the =
beginnings of such a model today, I see the<br>following =
deficiencies:<br><br><br>1. *Multiple Models*: The Storage, Hypervisor, =
and Security layers<br>each have a slightly different model for allowing =
system<br><br>functionality to<br><br>be extended/substituted. =
&nbsp;These differences increase the barrier of<br><br>entry<br><br>for =
vendors seeking to extend CloudStack and accrete code paths to =
be<br>maintained and verified.<br>2. *Leaky Abstraction*: &nbsp;Plugins =
are registered through a Spring<br>configuration file. &nbsp;In addition =
to being operator unfriendly (most<br>sysadmins are not Spring experts =
nor do they want to be), we expose<br><br>the<br><br>core bootstrapping =
mechanism to operators. &nbsp;Therefore, =
a<br><br>misconfiguration<br><br>could negatively impact the =
injection/configuration of internal<br><br>management<br><br>server =
components. &nbsp;Essentially handing them a loaded shotgun =
pointed<br><br>at<br><br>our right foot.<br>3. *Nondeterministic =
Load/Unload Model*: &nbsp;Because the core loading<br>mechanism is =
Spring, the management has little control over the<br><br>timing =
and<br><br>order of component loading/unloading. &nbsp;Changes to the =
Management<br><br>Server's<br><br>component dependency graph could break =
a driver by causing it to be<br><br>started<br><br>at an unexpected =
time.<br>4. *Lack of Execution Isolation*: As a Spring component, =
plugins are<br>loaded into the same execution context as core management =
server<br>components. &nbsp;Therefore, an errant plugin can corrupt the =
entire<br><br>management<br><br>server.<br><br><br>For next revision of =
the plugin/driver mechanism, I would like see us<br>migrate towards a =
standard pluggable driver model that supports all =
of<br><br>the<br><br>management server's extension points (e.g. network =
devices, storage<br>devices, hypervisors, etc) with the following =
capabilities:<br><br><br>- *Consolidated Lifecycle and Startup =
Procedure*: &nbsp;Drivers share a<br>common state machine and =
categorization (e.g. network, storage,<br><br>hypervisor,<br><br>etc) =
that permits the deterministic calculation of initialization =
and<br>destruction order (i.e. network layer drivers -&gt; storage =
layer<br><br>drivers -&gt;<br><br>hypervisor drivers). &nbsp;Plugin =
inter-dependencies would be supported<br><br>between<br><br>plugins =
sharing the same category.<br>- *In-process Installation and Upgrade*: =
Adding or upgrading a driver<br>does not require the management server =
to be restarted. &nbsp;This<br><br>capability<br><br>implies a system =
that supports the simultaneous execution of multiple<br>driver versions =
and the ability to suspend continued execution work<br><br>on =
a<br><br>resource while the underlying driver instance is replaced.<br>- =
*Execution Isolation*: The deployment packaging and =
execution<br>environment supports different (and potentially =
conflicting) versions<br><br>of<br><br>dependencies to be simultaneously =
used. &nbsp;Additionally, plugins would<br><br>be<br><br>sufficiently =
sandboxed to protect the management server against =
driver<br>instability.<br>- *Extension Data Model*: Drivers provide a =
property bag with a<br>metadata descriptor to validate and render vendor =
specific data. &nbsp;The<br>contents of this property bag will provided =
to every driver operation<br>invocation at runtime. &nbsp;The metadata =
descriptor would be a lightweight<br>description that provides a label =
resource key, a description<br><br>resource key,<br><br>data type =
(string, date, number, boolean), required flag, and optional<br>length =
limit.<br>- *Introspection: Administrative APIs/UIs allow operators =
to<br>understand the configuration of the drivers in the system, =
their<br>configuration, and their current state.*<br>- =
*Discoverability*: Optionally, drivers can be discovered via =
a<br>project repository definition (similar to Yum) allowing drivers to =
be<br>remotely acquired and operators to be notified regarding =
update<br>availability. &nbsp;The project would also provide, free of =
charge,<br><br>certificates<br><br>to sign plugins. &nbsp;This mechanism =
would support local mirroring to<br><br>support<br><br>air gapped =
management networks.<br><br><br>Fundamentally, I do not want to turn =
CloudStack into an erector set with<br>more screws than nuts which is a =
risk with highly pluggable<br><br>architectures.<br><br>As such, I think =
we would need to tightly bound the scope of drivers and<br>their =
behaviors to prevent the loss system usability and stability. =
&nbsp;My<br>thinking is that drivers would be packaged into a custom =
JAR, CAR<br>(CloudStack ARchive), that would be structured as =
followed:<br><br><br>- META-INF<br> &nbsp;&nbsp;- MANIFEST.MF<br> =
&nbsp;&nbsp;- driver.yaml (driver metadata(e.g. version, name, =
description,<br> &nbsp;&nbsp;etc) serialized in YAML format)<br> =
&nbsp;&nbsp;- LICENSE (a text file containing the driver's license)<br>- =
lib (driver dependencies)<br>- classes (driver implementation)<br>- =
resources (driver message files and potentially JS =
resources)<br><br><br>The management server would acquire drivers =
through a simple scan of a<br><br>URL<br><br>(e.g. file directory, S3 =
bucket, etc). &nbsp;For every CAR object found, the<br>management server =
would create an execution environment (likely =
a<br><br>dedicated<br><br>ExecutorService and Classloader), and =
transition the state of the<br><br>driver to<br><br>Running (the exact =
state model would need to be worked out). &nbsp;To =
be<br><br>really<br><br>nice, we could develop a custom Ant task/Maven =
plugin/Gradle plugin to<br>create CARs. &nbsp;&nbsp;I can also imagine =
an opportunities to add hooks to this<br>model to register =
instrumentation information with JMX =
and<br><br>authorization.<br><br><br>To keep the scope of this email =
confined, we would introduce the general<br>notion of a Resource, and =
(hand wave hand wave) eventually<br><br>compartmentalize<br><br>the =
execution of work around a resource [1]. &nbsp;This (hand =
waved)<br>compartmentalization would allow us the controls necessary to =
safely and<br>reliably perform in-place driver upgrades. &nbsp;For an =
initial release, I<br><br>would<br><br>recommend implementing the =
abstractions, loading mechanism, extension<br><br>data<br><br>model, and =
discovery features. &nbsp;With these capabilities in place, =
we<br><br>could<br><br>attack the in-place upgrade model.<br><br>If we =
were to adopt such a pluggable capability, we would have =
the<br>opportunity to decouple the vendor and CloudStack release =
schedules.<br><br>For<br><br>example, if a vendor were introducing a new =
product that required a new<br><br>or<br><br>updated driver, they would =
no longer need to wait for a CloudStack<br><br>release<br><br>to support =
it. &nbsp;They would also gain the ability to fix high =
priority<br>defects in the same manner.<br><br>I have hand waved a =
number of issues that would need to be =
resolved<br><br>before<br><br>such an approach could be implemented. =
&nbsp;However, I think we need to<br><br>decide,<br><br>as a community, =
that it worth devoting energy and effort to =
enhancing<br><br>the<br><br>plugin/driver model and the goals of that =
effort before driving head<br><br>first<br><br>into the deep rabbit hole =
of design/implementation.<br><br>Thoughts? (/me =
ducks)<br>-John<br><br>[1]: My opinions on the matter from CloudStack =
Collab 2013 -&gt;<br><br><a =
href=3D"http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cl=
oud-stack-distributed-process-management">http://www.slideshare.net/JohnBu=
rwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management=
</a><br><br><br><br><br>--<br>*Mike Tutkowski*<br>*Senior CloudStack =
Developer, SolidFire Inc.*<br>e: mike.tutkowski@solidfire.com<br>o: =
303.746.7302<br>Advancing the way the world uses =
the<br>cloud&lt;http://solidfire.com/solution/overview/?video=3Dplay&gt;<b=
r>*=99*<br><br><br><br>--<br>*Mike Tutkowski*<br>*Senior CloudStack =
Developer, SolidFire Inc.*<br>e: mike.tutkowski@solidfire.com<br>o: =
303.746.7302<br>Advancing the way the world uses =
the<br>cloud&lt;http://solidfire.com/solution/overview/?video=3Dplay&gt;<b=
r>*=99*<br><br><br></blockquote></blockquote></div><br></div></body></html=
>=

--Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104--

--Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJSFGV/AAoJEOXds3llSIFTN4AIAJv+iRAX4bM3BX9lu1PFvf4o
bysAK51j8Wc0fxgqOhL88QX1Phx9W0iHRcYCATGZTCt7nd1G1ZgDolQjZo6Op6Vg
2+cpFbFqRn/a4l2nmb5sFK32bJA1PQD9+0Ww0fPbzB72p/S997mxdX0lA0WIXTwb
AEs/nMOSI1GIpOHOpOpERiafxMQB3kH0yt9+5f0dChzTxZx1Qcz83ZONRxW2NEEr
S5Sxk9DOsPRjr6b9C9ib21UNJOV6EQgb3pgCu6l8pNe+fcdIGBKSSaFK2SvjNTdw
ipnTvzn/ZtHdPt37bhKvwdcvG7VJexp44+gVTGNR+KZpWxOOfqxBJQJ3PMsN3Ec=
=jIct
-----END PGP SIGNATURE-----

--Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B--