Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B90110420 for ; Wed, 21 Aug 2013 07:00:47 +0000 (UTC) Received: (qmail 80067 invoked by uid 500); 21 Aug 2013 07:00:46 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 79810 invoked by uid 500); 21 Aug 2013 07:00:46 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 79802 invoked by uid 99); 21 Aug 2013 07:00:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 07:00:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jburwell@basho.com designates 209.85.216.171 as permitted sender) Received: from [209.85.216.171] (HELO mail-qc0-f171.google.com) (209.85.216.171) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 07:00:38 +0000 Received: by mail-qc0-f171.google.com with SMTP id n1so19293qcw.16 for ; Wed, 21 Aug 2013 00:00:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=H160NjP/tNp5zTNdnuD1ETmvJNDFSI23wSDL0kZB36k=; b=awW7EMXGhahtcADvv15EFWIZcYQ3i6FpvEdKsPssaF4pJQtYScxlquc2pJ3dNTrq6u w+GO5+bePObaTy5Ym02+BHO8rtQfCBj075W4VMCmBaFErgpo46j3cToemNxrOPRF8U+b BY0yoPPl1exkVsRLjjEp/DwR25tttou0E1RRHoAsXuceUy+SY8Yz9Yt23+1wPksCtZZm zzka5BDIDv9rGyUe+KjJKdN9ZMKgB6BXBl4vaFQe8k2hLXPI0q+TepvNGh2vKoueWfE2 yz0RtxMK/XFgvHUl92PpTUqhL6Whceg84zWiARbCx1F2rdIq3tpcBW6tof+jQHoqOqAL 3ebg== X-Gm-Message-State: ALoCoQlbps65MkkSD2M8rbzMv7GiGJQWJOXC5SVkJ3U/XnFxn3GZNNl6DJhrMF/wo5BJudOr77g/ X-Received: by 10.49.70.164 with SMTP id n4mr6683765qeu.82.1377068417040; Wed, 21 Aug 2013 00:00:17 -0700 (PDT) Received: from jburwell-basho.cockamamy.net (c-98-218-146-14.hsd1.va.comcast.net. [98.218.146.14]) by mx.google.com with ESMTPSA id a6sm9271653qam.5.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 21 Aug 2013 00:00:15 -0700 (PDT) Content-Type: multipart/signed; boundary="Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: [DISCUSS/PROPOSAL] Upgrading Driver Model From: John Burwell In-Reply-To: Date: Wed, 21 Aug 2013 03:00:15 -0400 Cc: Darren Shepherd , Hugo Trippaers , "La Motta, David" Message-Id: <9B40F0CE-0F16-41E5-B605-37770AD75C05@basho.com> References: <80AC8E03-0EF4-4032-95C2-69273512357D@basho.com> <707F0358-E016-4BEE-8072-AAC62EAE9108@basho.com> <9E3C0FDB-B60D-419C-8702-ED3923094111@gmail.com> <0247E3A0-E19E-45B4-9548-C1DE97A438A0@gmail.com> To: dev@cloudstack.apache.org X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B Content-Type: multipart/alternative; boundary="Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104" --Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Daan, I have the following issues with OSGi:=20 Complexity: Building OSGi components adds a tremendous amount of = complexity to both the building drivers and debugging runtime issues. = Additionally, OSGi has a much broader feature set than I think = CloudStack needs to support. Therefore, driver authors may use the = feature set in unanticipated way that create system instability. Dependency Hell: OSGi requires 3rd party dependencies to be packaged as = OSGi bundles. In practice, many third party libraries either have = issues that prevent them from being bundles or their OSGi bundled = versions are behind mainline release. As an additionally personal experience, I do not want to re-create the = mess that is Eclipse (i.e. an erector set with more screws than nuts). = In addition to its lack of reliability, it is incredibly difficult to = comprehend how the component configurations and relationships are = composed at runtime. To be clear, I am not interested in creating a general purpose = component/plugin model. Fundamentally, we need a simple, purpose-built = component model focused on providing stability and reliability through = deterministic behavior rather than feature flexibility. Unfortunately, = both OSGi and Spring's focus on flexibility the later make them = ill-suited for our purposes. Thanks, -John On Aug 21, 2013, at 2:31 AM, Daan Hoogland = wrote: > John, >=20 > Nice work. > Given the maturity of OSGi, I'd say lets see how it fits. One criteria > would be can we limit the bundles that may be loaded based on what > Cloudstack supports (and not allow loading pydev) if not we need to > bake our own. >=20 > But though I think your work is valuable I disagree on designing our > CARs from the get go without having explored usable options in the > field first. A new type of YARs is not what the world or cloudstack > needs. And given what you have written the main problem wll be finding > a framework we can restrict to what we want, not one that can do all > of it. >=20 > done shooting, > Daan >=20 > On Wed, Aug 21, 2013 at 2:52 AM, Darren Shepherd > wrote: >> Sure, I fully understand how it theoretically works, but I'm saying = from a >> practical perspective it always seems to fall apart. What your = describing >> is done excellently in OSGI 4.2 Blueprint. It's a beautiful = framework that >> allows you to expose services that can be dynamically updated at = runtime. >>=20 >> The issues always happens with unloading. I'll give you a real world >> example. As part of the servlet spec your supposed to be able to = stop and >> unload wars. But in practice if you do it enough times you typically = run >> out of memory. So one such issue was with commons logging (since = fixed). >> When you do getLogger(myclass.class) it would cache a reference of = the Class >> object to the actual log impl. The commons logging jar is typically = loaded >> with a system classloader and but MyClass.class would be loaded in = the >> webapp classloader. So when you stop the war there is a reference = chain >> system classloader -> logfactory -> Myclass -> webapp classloader. = So the >> web app never gets GC'd. >>=20 >> So just pointing out the practical issues, that's it. >>=20 >> Darren >>=20 >> On Aug 20, 2013, at 5:31 PM, John Burwell wrote: >>=20 >> Darren, >>=20 >> Actually, loading and unloading aren't difficult if resource = management and >> drivers work within the following constraints/assumptions: >>=20 >> Drivers are transient and stateless >> A driver instance is assigned per resource managed (i.e. no = singletons) >> A lightweight thread and mailbox (i.e. actor model) are assigned per >> resource managed (outlined in the presentation referenced below) >>=20 >>=20 >> Based on these constraints and assumptions, the following upgrade = process >> could be implemented: >>=20 >> Load and verify new driver version to make it available >> Notify the supervisor processes of each affected resource that a new = driver >> is available >> Upon completion of the current message being processed by its = associated >> actor, the supervisor kills and respawns the actor managing its = associated >> resource >> As part of startup, the supervisor injects an instance of the new = driver >> version and the actor resumes processing messages in its mailbox >>=20 >>=20 >> This process mirrors the process that would occur on management = server >> startup for each resource minus killing an existing actor instance. >> Eventually, the system will upgrade the driver without loss of = operation. >> More sophisticated policies could be added, but I think this approach = would >> be a solid default upgrade behavior. As a bonus, this same approach = could >> also be applied to global configuration settings -- allowing the = system to >> apply changes to these values without restarting the system. >>=20 >> In summary, CloudStack and Eclipse are very different types of = systems. >> Eclipse is a desktop application implementing complex workflows, user >> interactions, and management of shared state (e.g. project structure, = AST, >> compiler status, etc). In contrast, CloudStack is an eventually = consistent >> distributed system performing automation control. As such, its = requirements >> plugin requirements are not only very different, but IMHO, much = simpler. >>=20 >> Thanks, >> -John >>=20 >> On Aug 20, 2013, at 7:44 PM, Darren Shepherd = >> wrote: >>=20 >> I know this isn't terribly useful, but I've been drawing a lot of = squares >> and circles and lines that connect those squares and circles lately = and I >> have a lot of architectural ideas for CloudStack. At the rate I'm = going it >> will take me about two weeks to put together a discussion/proposal = for the >> community. What I'm thinking is a superset of what you've listed out = and >> should align with your idea of a CAR. The focus has a a lot to do = with >> modularity and extensibility. >>=20 >> So more to come soon.... I will say one thing though, is with java = you end >> up having a hard time doing dynamic load and unloading of modules. = There's >> plenty of frameworks that try really hard to do this right, like = OSGI, but >> its darn near impossible to do it right because of class loading and = GC >> issues (and that's why Eclipse has you restart after installing plugs = even >> though it is OSGi). >>=20 >> I do believe that CloudStack should be possible of zero downtime = maintenance >> and have ideas around that, but at the end of the day, for plenty of >> practical reasons, you still need a JVM restart if modules change. >>=20 >> Darren >>=20 >> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski = >> wrote: >>=20 >> I agree, John - let's get consensus first, then talk time tables. >>=20 >>=20 >> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell = wrote: >>=20 >> Mike, >>=20 >> Before we can dig into timelines or implementations, I think we need = to >> get consensus on the problem to solved and the goals. Once we have a >> proper understanding of the scope, I believe we can chunk the across = a set >> of development lifecycle. The subject is vast, but it also has a far >> reaching impact to both the storage and network layer evolution = efforts. >> As such, I believe we need to start addressing it as part of the next >> release. >>=20 >> As a separate thread, we need to discuss the timeline for the next >> release. I think we need to avoid the time compression caused by the >> overlap of the 4.1 stabilization effort and 4.2 development. = Therefore, I >> don't think we should consider development of the next release = started >> until the first 4.2 RC is released. I will try to open a separate = discuss >> thread for this topic, as well as, tying of the discussion of release = code >> names. >>=20 >> Thanks, >> -John >>=20 >> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski = >> wrote: >>=20 >> Hey John, >>=20 >> I think this is some great stuff. Thanks for the write up. >>=20 >> It looks like you have ideas around what might go into a first = release of >> this plug-in framework. Were you thinking we'd have enough time to >>=20 >> squeeze >>=20 >> that first rev into 4.3. I'm just wondering (it's not a huge deal to = hit >> that release for this) because we would only have about five weeks. >>=20 >> Thanks >>=20 >>=20 >> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell >>=20 >> wrote: >>=20 >>=20 >> All, >>=20 >> In capturing my thoughts on storage, my thinking backed into the = driver >> model. While we have the beginnings of such a model today, I see the >> following deficiencies: >>=20 >>=20 >> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers >> each have a slightly different model for allowing system >>=20 >> functionality to >>=20 >> be extended/substituted. These differences increase the barrier of >>=20 >> entry >>=20 >> for vendors seeking to extend CloudStack and accrete code paths to be >> maintained and verified. >> 2. *Leaky Abstraction*: Plugins are registered through a Spring >> configuration file. In addition to being operator unfriendly (most >> sysadmins are not Spring experts nor do they want to be), we expose >>=20 >> the >>=20 >> core bootstrapping mechanism to operators. Therefore, a >>=20 >> misconfiguration >>=20 >> could negatively impact the injection/configuration of internal >>=20 >> management >>=20 >> server components. Essentially handing them a loaded shotgun pointed >>=20 >> at >>=20 >> our right foot. >> 3. *Nondeterministic Load/Unload Model*: Because the core loading >> mechanism is Spring, the management has little control over the >>=20 >> timing and >>=20 >> order of component loading/unloading. Changes to the Management >>=20 >> Server's >>=20 >> component dependency graph could break a driver by causing it to be >>=20 >> started >>=20 >> at an unexpected time. >> 4. *Lack of Execution Isolation*: As a Spring component, plugins are >> loaded into the same execution context as core management server >> components. Therefore, an errant plugin can corrupt the entire >>=20 >> management >>=20 >> server. >>=20 >>=20 >> For next revision of the plugin/driver mechanism, I would like see us >> migrate towards a standard pluggable driver model that supports all = of >>=20 >> the >>=20 >> management server's extension points (e.g. network devices, storage >> devices, hypervisors, etc) with the following capabilities: >>=20 >>=20 >> - *Consolidated Lifecycle and Startup Procedure*: Drivers share a >> common state machine and categorization (e.g. network, storage, >>=20 >> hypervisor, >>=20 >> etc) that permits the deterministic calculation of initialization and >> destruction order (i.e. network layer drivers -> storage layer >>=20 >> drivers -> >>=20 >> hypervisor drivers). Plugin inter-dependencies would be supported >>=20 >> between >>=20 >> plugins sharing the same category. >> - *In-process Installation and Upgrade*: Adding or upgrading a driver >> does not require the management server to be restarted. This >>=20 >> capability >>=20 >> implies a system that supports the simultaneous execution of multiple >> driver versions and the ability to suspend continued execution work >>=20 >> on a >>=20 >> resource while the underlying driver instance is replaced. >> - *Execution Isolation*: The deployment packaging and execution >> environment supports different (and potentially conflicting) versions >>=20 >> of >>=20 >> dependencies to be simultaneously used. Additionally, plugins would >>=20 >> be >>=20 >> sufficiently sandboxed to protect the management server against = driver >> instability. >> - *Extension Data Model*: Drivers provide a property bag with a >> metadata descriptor to validate and render vendor specific data. The >> contents of this property bag will provided to every driver operation >> invocation at runtime. The metadata descriptor would be a = lightweight >> description that provides a label resource key, a description >>=20 >> resource key, >>=20 >> data type (string, date, number, boolean), required flag, and = optional >> length limit. >> - *Introspection: Administrative APIs/UIs allow operators to >> understand the configuration of the drivers in the system, their >> configuration, and their current state.* >> - *Discoverability*: Optionally, drivers can be discovered via a >> project repository definition (similar to Yum) allowing drivers to be >> remotely acquired and operators to be notified regarding update >> availability. The project would also provide, free of charge, >>=20 >> certificates >>=20 >> to sign plugins. This mechanism would support local mirroring to >>=20 >> support >>=20 >> air gapped management networks. >>=20 >>=20 >> Fundamentally, I do not want to turn CloudStack into an erector set = with >> more screws than nuts which is a risk with highly pluggable >>=20 >> architectures. >>=20 >> As such, I think we would need to tightly bound the scope of drivers = and >> their behaviors to prevent the loss system usability and stability. = My >> thinking is that drivers would be packaged into a custom JAR, CAR >> (CloudStack ARchive), that would be structured as followed: >>=20 >>=20 >> - META-INF >> - MANIFEST.MF >> - driver.yaml (driver metadata(e.g. version, name, description, >> etc) serialized in YAML format) >> - LICENSE (a text file containing the driver's license) >> - lib (driver dependencies) >> - classes (driver implementation) >> - resources (driver message files and potentially JS resources) >>=20 >>=20 >> The management server would acquire drivers through a simple scan of = a >>=20 >> URL >>=20 >> (e.g. file directory, S3 bucket, etc). For every CAR object found, = the >> management server would create an execution environment (likely a >>=20 >> dedicated >>=20 >> ExecutorService and Classloader), and transition the state of the >>=20 >> driver to >>=20 >> Running (the exact state model would need to be worked out). To be >>=20 >> really >>=20 >> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin = to >> create CARs. I can also imagine an opportunities to add hooks to = this >> model to register instrumentation information with JMX and >>=20 >> authorization. >>=20 >>=20 >> To keep the scope of this email confined, we would introduce the = general >> notion of a Resource, and (hand wave hand wave) eventually >>=20 >> compartmentalize >>=20 >> the execution of work around a resource [1]. This (hand waved) >> compartmentalization would allow us the controls necessary to safely = and >> reliably perform in-place driver upgrades. For an initial release, I >>=20 >> would >>=20 >> recommend implementing the abstractions, loading mechanism, extension >>=20 >> data >>=20 >> model, and discovery features. With these capabilities in place, we >>=20 >> could >>=20 >> attack the in-place upgrade model. >>=20 >> If we were to adopt such a pluggable capability, we would have the >> opportunity to decouple the vendor and CloudStack release schedules. >>=20 >> For >>=20 >> example, if a vendor were introducing a new product that required a = new >>=20 >> or >>=20 >> updated driver, they would no longer need to wait for a CloudStack >>=20 >> release >>=20 >> to support it. They would also gain the ability to fix high priority >> defects in the same manner. >>=20 >> I have hand waved a number of issues that would need to be resolved >>=20 >> before >>=20 >> such an approach could be implemented. However, I think we need to >>=20 >> decide, >>=20 >> as a community, that it worth devoting energy and effort to enhancing >>=20 >> the >>=20 >> plugin/driver model and the goals of that effort before driving head >>=20 >> first >>=20 >> into the deep rabbit hole of design/implementation. >>=20 >> Thoughts? (/me ducks) >> -John >>=20 >> [1]: My opinions on the matter from CloudStack Collab 2013 -> >>=20 >> = http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stac= k-distributed-process-management >>=20 >>=20 >>=20 >>=20 >> -- >> *Mike Tutkowski* >> *Senior CloudStack Developer, SolidFire Inc.* >> e: mike.tutkowski@solidfire.com >> o: 303.746.7302 >> Advancing the way the world uses the >> cloud >> *=99* >>=20 >>=20 >>=20 >> -- >> *Mike Tutkowski* >> *Senior CloudStack Developer, SolidFire Inc.* >> e: mike.tutkowski@solidfire.com >> o: 303.746.7302 >> Advancing the way the world uses the >> cloud >> *=99* >>=20 >>=20 --Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
  • Complexity:  Building OSGi components = adds a tremendous amount of complexity to both the building drivers and = debugging runtime issues.  Additionally, OSGi has a much broader = feature set than I think CloudStack needs to support.  Therefore, = driver authors may use the feature set in unanticipated way that create = system instability.
  • Dependency Hell: OSGi requires 3rd party = dependencies to be packaged as OSGi bundles.  In practice, many = third party libraries either have issues that prevent them from being = bundles or their OSGi bundled versions are behind mainline = release.

  • As an additionally personal = experience, I do not want to re-create the mess that is Eclipse (i.e. an = erector set with more screws than nuts).  In addition to its lack = of reliability, it is incredibly difficult to comprehend how the = component configurations and relationships are composed at = runtime.

    To be clear, I am not interested in = creating a general purpose component/plugin model.  Fundamentally, = we need a simple, purpose-built component model focused on providing = stability and reliability through deterministic behavior rather than = feature flexibility.  Unfortunately, both OSGi and Spring's focus = on flexibility the later make them ill-suited for our = purposes.

    Thanks,
    -John

    <= div>
    On Aug 21, 2013, at 2:31 AM, Daan Hoogland <daan.hoogland@gmail.com> = wrote:

    John,

    Nice work.
    Given the maturity of OSGi, I'd = say lets see how it fits. One criteria
    would be can we limit the = bundles that may be loaded based on what
    Cloudstack supports (and not = allow loading pydev) if not we need to
    bake our own.

    But = though I think your work is valuable I disagree on designing our
    CARs = from the get go without having explored usable options in the
    field = first. A new type of YARs is not what the world or cloudstack
    needs. = And given what you have written the main problem wll be finding
    a = framework we can restrict to what we want, not one that can do all
    of = it.

    done shooting,
    Daan

    On Wed, Aug 21, 2013 at 2:52 = AM, Darren Shepherd
    <darren.s.shepherd@gmail.com> wrote:
    Sure, I fully understand how = it theoretically works, but I'm saying from a
    practical perspective = it always seems to fall apart.  What your describing
    is done = excellently in OSGI 4.2 Blueprint.  It's a beautiful framework = that
    allows you to expose services that can be dynamically updated at = runtime.

    The issues always happens with unloading.  I'll = give you a real world
    example.  As part of the servlet spec your = supposed to be able to stop and
    unload wars.  But in practice if = you do it enough times you typically run
    out of memory.  So one = such issue was with commons logging (since fixed).
    When you do = getLogger(myclass.class) it would cache a reference of the = Class
    object to the actual log impl.  The commons logging jar is = typically loaded
    with a system classloader and but MyClass.class = would be loaded in the
    webapp classloader.  So when you stop the = war there is a reference chain
    system classloader -> logfactory = -> Myclass -> webapp classloader.  So the
    web app never = gets GC'd.

    So just pointing out the practical issues, that's = it.

    Darren

    On Aug 20, 2013, at 5:31 PM, John Burwell = <
    jburwell@basho.com> = wrote:

    Darren,

    Actually, loading and unloading aren't = difficult if resource management and
    drivers work within the = following constraints/assumptions:

    Drivers are transient and = stateless
    A driver instance is assigned per resource managed (i.e. no = singletons)
    A lightweight thread and mailbox (i.e. actor model) are = assigned per
    resource managed (outlined in the presentation = referenced below)


    Based on these constraints and assumptions, = the following upgrade process
    could be implemented:

    Load and = verify new driver version to make it available
    Notify the supervisor = processes of each affected resource that a new driver
    is = available
    Upon completion of the current message being processed by = its associated
    actor, the supervisor kills and respawns the actor = managing its associated
    resource
    As part of startup, the = supervisor injects an instance of the new driver
    version and the = actor resumes processing messages in its mailbox


    This process = mirrors the process that would occur on management server
    startup for = each resource minus killing an existing actor instance.
    Eventually, = the system will upgrade the driver without loss of operation.
    More = sophisticated policies could be added, but I think this approach = would
    be a solid default upgrade behavior.  As a bonus, this = same approach could
    also be applied to global configuration settings = -- allowing the system to
    apply changes to these values without = restarting the system.

    In summary, CloudStack and Eclipse are = very different types of systems.
    Eclipse is a desktop application = implementing complex workflows, user
    interactions, and management of = shared state (e.g. project structure, AST,
    compiler status, etc). =  In contrast, CloudStack is an eventually consistent
    distributed = system performing automation control.  As such, its = requirements
    plugin requirements are not only very different, but = IMHO, much simpler.

    Thanks,
    -John

    On Aug 20, 2013, at = 7:44 PM, Darren Shepherd <darren.s.shepherd@gmail.com>
    wrote:

    I know this isn't terribly useful, but I've been = drawing a lot of squares
    and circles and lines that connect those = squares and circles lately and I
    have a lot of architectural ideas = for CloudStack.  At the rate I'm going it
    will take me about two = weeks to put together a discussion/proposal for the
    community. =  What I'm thinking is a superset of what you've listed out = and
    should align with your idea of a CAR.  The focus has a a lot = to do with
    modularity and extensibility.

    So more to come = soon....  I will say one thing though, is with java you end
    up = having a hard time doing dynamic load and unloading of modules. =  There's
    plenty of frameworks that try really hard to do this = right, like OSGI, but
    its darn near impossible to do it right because = of class loading and GC
    issues (and that's why Eclipse has you = restart after installing plugs even
    though it is OSGi).

    I do = believe that CloudStack should be possible of zero downtime = maintenance
    and have ideas around that, but at the end of the day, = for plenty of
    practical reasons, you still need a JVM restart if = modules change.

    Darren

    On Aug 20, 2013, at 3:39 PM, Mike = Tutkowski <
    mike.tutkowski@solidfire.com<= /a>>
    wrote:

    I agree, John - let's get consensus first, then = talk time tables.


    On Tue, Aug 20, 2013 at 4:31 PM, John = Burwell <
    jburwell@basho.com> = wrote:

    Mike,

    Before we can dig into timelines or = implementations, I think we need to
    get consensus on the problem to = solved and the goals.  Once we have a
    proper understanding of = the scope, I believe we can chunk the across a set
    of development = lifecycle.  The subject is vast, but it also has a far
    reaching = impact to both the storage and network layer evolution efforts.
    As = such, I believe we need to start addressing it as part of the = next
    release.

    As a separate thread, we need to discuss the = timeline for the next
    release.  I think we need to avoid the = time compression caused by the
    overlap of the 4.1 stabilization = effort and 4.2 development.  Therefore, I
    don't think we should = consider development of the next release started
    until the first 4.2 = RC is released.  I will try to open a separate discuss
    thread = for this topic, as well as, tying of the discussion of release = code
    names.

    Thanks,
    -John

    On Aug 20, 2013, at 6:22 = PM, Mike Tutkowski <mike.tutkowski@solidfire.com<= /a>>
    wrote:

    Hey John,

    I think this is some great = stuff. Thanks for the write up.

    It looks like you have ideas = around what might go into a first release of
    this plug-in framework. = Were you thinking we'd have enough time to

    squeeze

    that = first rev into 4.3. I'm just wondering (it's not a huge deal to = hit
    that release for this) because we would only have about five = weeks.

    Thanks


    On Tue, Aug 20, 2013 at 3:43 PM, John = Burwell <
    jburwell@basho.com>

    wrote= :


    All,

    In capturing my thoughts on storage, my = thinking backed into the driver
    model.  While we have the = beginnings of such a model today, I see the
    following = deficiencies:


    1. *Multiple Models*: The Storage, Hypervisor, = and Security layers
    each have a slightly different model for allowing = system

    functionality to

    be extended/substituted. =  These differences increase the barrier of

    entry

    for = vendors seeking to extend CloudStack and accrete code paths to = be
    maintained and verified.
    2. *Leaky Abstraction*:  Plugins = are registered through a Spring
    configuration file.  In addition = to being operator unfriendly (most
    sysadmins are not Spring experts = nor do they want to be), we expose

    the

    core bootstrapping = mechanism to operators.  Therefore, = a

    misconfiguration

    could negatively impact the = injection/configuration of internal

    management

    server = components.  Essentially handing them a loaded shotgun = pointed

    at

    our right foot.
    3. *Nondeterministic = Load/Unload Model*:  Because the core loading
    mechanism is = Spring, the management has little control over the

    timing = and

    order of component loading/unloading.  Changes to the = Management

    Server's

    component dependency graph could break = a driver by causing it to be

    started

    at an unexpected = time.
    4. *Lack of Execution Isolation*: As a Spring component, = plugins are
    loaded into the same execution context as core management = server
    components.  Therefore, an errant plugin can corrupt the = entire

    management

    server.


    For next revision of = the plugin/driver mechanism, I would like see us
    migrate towards a = standard pluggable driver model that supports all = of

    the

    management server's extension points (e.g. network = devices, storage
    devices, hypervisors, etc) with the following = capabilities:


    - *Consolidated Lifecycle and Startup = Procedure*:  Drivers share a
    common state machine and = categorization (e.g. network, storage,

    hypervisor,

    etc) = that permits the deterministic calculation of initialization = and
    destruction order (i.e. network layer drivers -> storage = layer

    drivers ->

    hypervisor drivers).  Plugin = inter-dependencies would be supported

    between

    plugins = sharing the same category.
    - *In-process Installation and Upgrade*: = Adding or upgrading a driver
    does not require the management server = to be restarted.  This

    capability

    implies a system = that supports the simultaneous execution of multiple
    driver versions = and the ability to suspend continued execution work

    on = a

    resource while the underlying driver instance is replaced.
    - = *Execution Isolation*: The deployment packaging and = execution
    environment supports different (and potentially = conflicting) versions

    of

    dependencies to be simultaneously = used.  Additionally, plugins would

    be

    sufficiently = sandboxed to protect the management server against = driver
    instability.
    - *Extension Data Model*: Drivers provide a = property bag with a
    metadata descriptor to validate and render vendor = specific data.  The
    contents of this property bag will provided = to every driver operation
    invocation at runtime.  The metadata = descriptor would be a lightweight
    description that provides a label = resource key, a description

    resource key,

    data type = (string, date, number, boolean), required flag, and optional
    length = limit.
    - *Introspection: Administrative APIs/UIs allow operators = to
    understand the configuration of the drivers in the system, = their
    configuration, and their current state.*
    - = *Discoverability*: Optionally, drivers can be discovered via = a
    project repository definition (similar to Yum) allowing drivers to = be
    remotely acquired and operators to be notified regarding = update
    availability.  The project would also provide, free of = charge,

    certificates

    to sign plugins.  This mechanism = would support local mirroring to

    support

    air gapped = management networks.


    Fundamentally, I do not want to turn = CloudStack into an erector set with
    more screws than nuts which is a = risk with highly pluggable

    architectures.

    As such, I think = we would need to tightly bound the scope of drivers and
    their = behaviors to prevent the loss system usability and stability. =  My
    thinking is that drivers would be packaged into a custom = JAR, CAR
    (CloudStack ARchive), that would be structured as = followed:


    - META-INF
      - MANIFEST.MF
    =   - driver.yaml (driver metadata(e.g. version, name, = description,
      etc) serialized in YAML format)
    =   - LICENSE (a text file containing the driver's license)
    - = lib (driver dependencies)
    - classes (driver implementation)
    - = resources (driver message files and potentially JS = resources)


    The management server would acquire drivers = through a simple scan of a

    URL

    (e.g. file directory, S3 = bucket, etc).  For every CAR object found, the
    management server = would create an execution environment (likely = a

    dedicated

    ExecutorService and Classloader), and = transition the state of the

    driver to

    Running (the exact = state model would need to be worked out).  To = be

    really

    nice, we could develop a custom Ant task/Maven = plugin/Gradle plugin to
    create CARs.   I can also imagine = an opportunities to add hooks to this
    model to register = instrumentation information with JMX = and

    authorization.


    To keep the scope of this email = confined, we would introduce the general
    notion of a Resource, and = (hand wave hand wave) eventually

    compartmentalize

    the = execution of work around a resource [1].  This (hand = waved)
    compartmentalization would allow us the controls necessary to = safely and
    reliably perform in-place driver upgrades.  For an = initial release, I

    would

    recommend implementing the = abstractions, loading mechanism, extension

    data

    model, and = discovery features.  With these capabilities in place, = we

    could

    attack the in-place upgrade model.

    If we = were to adopt such a pluggable capability, we would have = the
    opportunity to decouple the vendor and CloudStack release = schedules.

    For

    example, if a vendor were introducing a new = product that required a new

    or

    updated driver, they would = no longer need to wait for a CloudStack

    release

    to support = it.  They would also gain the ability to fix high = priority
    defects in the same manner.

    I have hand waved a = number of issues that would need to be = resolved

    before

    such an approach could be implemented. =  However, I think we need to

    decide,

    as a community, = that it worth devoting energy and effort to = enhancing

    the

    plugin/driver model and the goals of that = effort before driving head

    first

    into the deep rabbit hole = of design/implementation.

    Thoughts? (/me = ducks)
    -John

    [1]: My opinions on the matter from CloudStack = Collab 2013 ->

    http://www.slideshare.net/JohnBu= rwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management=




    --
    *Mike Tutkowski*
    *Senior CloudStack = Developer, SolidFire Inc.*
    e: mike.tutkowski@solidfire.com
    o: = 303.746.7302
    Advancing the way the world uses = the
    cloud<http://solidfire.com/solution/overview/?video=3Dplay>*=99*



    --
    *Mike Tutkowski*
    *Senior CloudStack = Developer, SolidFire Inc.*
    e: mike.tutkowski@solidfire.com
    o: = 303.746.7302
    Advancing the way the world uses = the
    cloud<http://solidfire.com/solution/overview/?video=3Dplay>*=99*



    = --Apple-Mail=_FA0DC2A3-BF85-43C3-AC2D-3AAAD431D104-- --Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJSFGV/AAoJEOXds3llSIFTN4AIAJv+iRAX4bM3BX9lu1PFvf4o bysAK51j8Wc0fxgqOhL88QX1Phx9W0iHRcYCATGZTCt7nd1G1ZgDolQjZo6Op6Vg 2+cpFbFqRn/a4l2nmb5sFK32bJA1PQD9+0Ww0fPbzB72p/S997mxdX0lA0WIXTwb AEs/nMOSI1GIpOHOpOpERiafxMQB3kH0yt9+5f0dChzTxZx1Qcz83ZONRxW2NEEr S5Sxk9DOsPRjr6b9C9ib21UNJOV6EQgb3pgCu6l8pNe+fcdIGBKSSaFK2SvjNTdw ipnTvzn/ZtHdPt37bhKvwdcvG7VJexp44+gVTGNR+KZpWxOOfqxBJQJ3PMsN3Ec= =jIct -----END PGP SIGNATURE----- --Apple-Mail=_8426603C-1408-453B-95B4-97E0DCA5705B--