Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 55EAD105C2 for ; Tue, 20 Aug 2013 22:39:58 +0000 (UTC) Received: (qmail 87811 invoked by uid 500); 20 Aug 2013 22:39:57 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 87782 invoked by uid 500); 20 Aug 2013 22:39:57 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 87774 invoked by uid 99); 20 Aug 2013 22:39:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Aug 2013 22:39:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mike.tutkowski@solidfire.com designates 209.85.219.54 as permitted sender) Received: from [209.85.219.54] (HELO mail-oa0-f54.google.com) (209.85.219.54) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Aug 2013 22:39:53 +0000 Received: by mail-oa0-f54.google.com with SMTP id o6so2126747oag.27 for ; Tue, 20 Aug 2013 15:39:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=ZA59KuG4CS5MxowoPbzs70d1bXCUjgd+x1+Hs+BcDSc=; b=mNqDiQoXt8JpwfiR+V7cbrJwLW0DcBpJZE4FL18NNkPAgQ4lPChfWJ3v6d1+MHMr89 4Hu+Jo6/Dt2mEnSW6wASZOOYV32s8+AhFFb+rpbh78uHZuzG87u3mgEucxp1Ot6fxzTG jYelctZWwJaD6Afbp4+hGUYH7+8T4vA5U7n4vlgKidtVWsOx+8xpF3Ww2MVKIwnhlR2C 4qlCfSGq6J4jVaHgiBCh2b5i1MJxYet3zcTaqhbb6Ad1ruovaUnmCpHRLow5wxtm2Csn z5Y4B1sLuTiuGtWeZzIT42aJB52Uxy3vKusHk1flHNwdkI7x4Jtbf001Z6x10EguCAYg d4Xw== X-Gm-Message-State: ALoCoQkhj9YoRkCEA7lHsf6By3mAxRvqopRRWVG4IbJ58o6NJyyYiOElYqjxLR8+8kbz2mhlg1KM MIME-Version: 1.0 X-Received: by 10.60.62.38 with SMTP id v6mr4402823oer.45.1377038372855; Tue, 20 Aug 2013 15:39:32 -0700 (PDT) Received: by 10.182.118.168 with HTTP; Tue, 20 Aug 2013 15:39:32 -0700 (PDT) In-Reply-To: <707F0358-E016-4BEE-8072-AAC62EAE9108@basho.com> References: <80AC8E03-0EF4-4032-95C2-69273512357D@basho.com> <707F0358-E016-4BEE-8072-AAC62EAE9108@basho.com> Date: Tue, 20 Aug 2013 16:39:32 -0600 Message-ID: Subject: Re: [DISCUSS/PROPOSAL] Upgrading Driver Model From: Mike Tutkowski To: "dev@cloudstack.apache.org" Cc: Daan Hoogland , Hugo Trippaers , "La Motta, David" Content-Type: multipart/alternative; boundary=089e0153761e0e7a7004e468bb5b X-Virus-Checked: Checked by ClamAV on apache.org --089e0153761e0e7a7004e468bb5b Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I agree, John - let's get consensus first, then talk time tables. On Tue, Aug 20, 2013 at 4:31 PM, John Burwell wrote: > Mike, > > Before we can dig into timelines or implementations, I think we need to > get consensus on the problem to solved and the goals. Once we have a > proper understanding of the scope, I believe we can chunk the across a se= t > of development lifecycle. The subject is vast, but it also has a far > reaching impact to both the storage and network layer evolution efforts. > As such, I believe we need to start addressing it as part of the next > release. > > As a separate thread, we need to discuss the timeline for the next > release. I think we need to avoid the time compression caused by the > overlap of the 4.1 stabilization effort and 4.2 development. Therefore, = I > don't think we should consider development of the next release started > until the first 4.2 RC is released. I will try to open a separate discus= s > thread for this topic, as well as, tying of the discussion of release cod= e > names. > > Thanks, > -John > > On Aug 20, 2013, at 6:22 PM, Mike Tutkowski > wrote: > > > Hey John, > > > > I think this is some great stuff. Thanks for the write up. > > > > It looks like you have ideas around what might go into a first release = of > > this plug-in framework. Were you thinking we'd have enough time to > squeeze > > that first rev into 4.3. I'm just wondering (it's not a huge deal to hi= t > > that release for this) because we would only have about five weeks. > > > > Thanks > > > > > > On Tue, Aug 20, 2013 at 3:43 PM, John Burwell > wrote: > > > >> All, > >> > >> In capturing my thoughts on storage, my thinking backed into the drive= r > >> model. While we have the beginnings of such a model today, I see the > >> following deficiencies: > >> > >> > >> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers > >> each have a slightly different model for allowing system > functionality to > >> be extended/substituted. These differences increase the barrier of > entry > >> for vendors seeking to extend CloudStack and accrete code paths to b= e > >> maintained and verified. > >> 2. *Leaky Abstraction*: Plugins are registered through a Spring > >> configuration file. In addition to being operator unfriendly (most > >> sysadmins are not Spring experts nor do they want to be), we expose > the > >> core bootstrapping mechanism to operators. Therefore, a > misconfiguration > >> could negatively impact the injection/configuration of internal > management > >> server components. Essentially handing them a loaded shotgun pointe= d > at > >> our right foot. > >> 3. *Nondeterministic Load/Unload Model*: Because the core loading > >> mechanism is Spring, the management has little control over the > timing and > >> order of component loading/unloading. Changes to the Management > Server's > >> component dependency graph could break a driver by causing it to be > started > >> at an unexpected time. > >> 4. *Lack of Execution Isolation*: As a Spring component, plugins are > >> loaded into the same execution context as core management server > >> components. Therefore, an errant plugin can corrupt the entire > management > >> server. > >> > >> > >> For next revision of the plugin/driver mechanism, I would like see us > >> migrate towards a standard pluggable driver model that supports all of > the > >> management server's extension points (e.g. network devices, storage > >> devices, hypervisors, etc) with the following capabilities: > >> > >> > >> - *Consolidated Lifecycle and Startup Procedure*: Drivers share a > >> common state machine and categorization (e.g. network, storage, > hypervisor, > >> etc) that permits the deterministic calculation of initialization an= d > >> destruction order (i.e. network layer drivers -> storage layer > drivers -> > >> hypervisor drivers). Plugin inter-dependencies would be supported > between > >> plugins sharing the same category. > >> - *In-process Installation and Upgrade*: Adding or upgrading a drive= r > >> does not require the management server to be restarted. This > capability > >> implies a system that supports the simultaneous execution of multipl= e > >> driver versions and the ability to suspend continued execution work > on a > >> resource while the underlying driver instance is replaced. > >> - *Execution Isolation*: The deployment packaging and execution > >> environment supports different (and potentially conflicting) version= s > of > >> dependencies to be simultaneously used. Additionally, plugins would > be > >> sufficiently sandboxed to protect the management server against driv= er > >> instability. > >> - *Extension Data Model*: Drivers provide a property bag with a > >> metadata descriptor to validate and render vendor specific data. Th= e > >> contents of this property bag will provided to every driver operatio= n > >> invocation at runtime. The metadata descriptor would be a lightweig= ht > >> description that provides a label resource key, a description > resource key, > >> data type (string, date, number, boolean), required flag, and option= al > >> length limit. > >> - *Introspection: Administrative APIs/UIs allow operators to > >> understand the configuration of the drivers in the system, their > >> configuration, and their current state.* > >> - *Discoverability*: Optionally, drivers can be discovered via a > >> project repository definition (similar to Yum) allowing drivers to b= e > >> remotely acquired and operators to be notified regarding update > >> availability. The project would also provide, free of charge, > certificates > >> to sign plugins. This mechanism would support local mirroring to > support > >> air gapped management networks. > >> > >> > >> Fundamentally, I do not want to turn CloudStack into an erector set wi= th > >> more screws than nuts which is a risk with highly pluggable > architectures. > >> As such, I think we would need to tightly bound the scope of drivers a= nd > >> their behaviors to prevent the loss system usability and stability. M= y > >> thinking is that drivers would be packaged into a custom JAR, CAR > >> (CloudStack ARchive), that would be structured as followed: > >> > >> > >> - META-INF > >> - MANIFEST.MF > >> - driver.yaml (driver metadata(e.g. version, name, description, > >> etc) serialized in YAML format) > >> - LICENSE (a text file containing the driver's license) > >> - lib (driver dependencies) > >> - classes (driver implementation) > >> - resources (driver message files and potentially JS resources) > >> > >> > >> The management server would acquire drivers through a simple scan of a > URL > >> (e.g. file directory, S3 bucket, etc). For every CAR object found, th= e > >> management server would create an execution environment (likely a > dedicated > >> ExecutorService and Classloader), and transition the state of the > driver to > >> Running (the exact state model would need to be worked out). To be > really > >> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to > >> create CARs. I can also imagine an opportunities to add hooks to thi= s > >> model to register instrumentation information with JMX and > authorization. > >> > >> To keep the scope of this email confined, we would introduce the gener= al > >> notion of a Resource, and (hand wave hand wave) eventually > compartmentalize > >> the execution of work around a resource [1]. This (hand waved) > >> compartmentalization would allow us the controls necessary to safely a= nd > >> reliably perform in-place driver upgrades. For an initial release, I > would > >> recommend implementing the abstractions, loading mechanism, extension > data > >> model, and discovery features. With these capabilities in place, we > could > >> attack the in-place upgrade model. > >> > >> If we were to adopt such a pluggable capability, we would have the > >> opportunity to decouple the vendor and CloudStack release schedules. > For > >> example, if a vendor were introducing a new product that required a ne= w > or > >> updated driver, they would no longer need to wait for a CloudStack > release > >> to support it. They would also gain the ability to fix high priority > >> defects in the same manner. > >> > >> I have hand waved a number of issues that would need to be resolved > before > >> such an approach could be implemented. However, I think we need to > decide, > >> as a community, that it worth devoting energy and effort to enhancing > the > >> plugin/driver model and the goals of that effort before driving head > first > >> into the deep rabbit hole of design/implementation. > >> > >> Thoughts? (/me ducks) > >> -John > >> > >> [1]: My opinions on the matter from CloudStack Collab 2013 -> > >> > http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-sta= ck-distributed-process-management > >> > > > > > > > > -- > > *Mike Tutkowski* > > *Senior CloudStack Developer, SolidFire Inc.* > > e: mike.tutkowski@solidfire.com > > o: 303.746.7302 > > Advancing the way the world uses the > > cloud > > *=99* > > --=20 *Mike Tutkowski* *Senior CloudStack Developer, SolidFire Inc.* e: mike.tutkowski@solidfire.com o: 303.746.7302 Advancing the way the world uses the cloud *=99* --089e0153761e0e7a7004e468bb5b--