Return-Path: X-Original-To: apmail-incubator-cloudstack-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-cloudstack-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C8507E37D for ; Wed, 16 Jan 2013 01:36:22 +0000 (UTC) Received: (qmail 94712 invoked by uid 500); 16 Jan 2013 01:36:22 -0000 Delivered-To: apmail-incubator-cloudstack-dev-archive@incubator.apache.org Received: (qmail 94670 invoked by uid 500); 16 Jan 2013 01:36:22 -0000 Mailing-List: contact cloudstack-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cloudstack-dev@incubator.apache.org Delivered-To: mailing list cloudstack-dev@incubator.apache.org Received: (qmail 94660 invoked by uid 99); 16 Jan 2013 01:36:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 01:36:22 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Edison.su@citrix.com designates 66.165.176.63 as permitted sender) Received: from [66.165.176.63] (HELO SMTP02.CITRIX.COM) (66.165.176.63) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 01:36:17 +0000 X-IronPort-AV: E=Sophos;i="4.84,476,1355097600"; d="scan'208";a="3674403" Received: from sjcpmailmx02.citrite.net ([10.216.14.75]) by FTLPIPO02.CITRIX.COM with ESMTP/TLS/RC4-MD5; 16 Jan 2013 01:35:55 +0000 Received: from SJCPMAILBOX01.citrite.net ([10.216.4.72]) by SJCPMAILMX02.citrite.net ([10.216.14.75]) with mapi; Tue, 15 Jan 2013 17:35:55 -0800 From: Edison Su To: "cloudstack-dev@incubator.apache.org" Date: Tue, 15 Jan 2013 17:35:53 -0800 Subject: RE: new storage framework update Thread-Topic: new storage framework update Thread-Index: Ac3wOoat0bj0ndJmT3icihkBXmcedgCaXwagADZOUwA= Message-ID: References: <4EADDB6B-D91F-41F8-BEED-F4944566CAEE@basho.com> <9506226C-2CDC-4FD5-AA7C-879D994959FB@basho.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org After a lengthy discussion(more than two hours) with John on Skype, I think= we figured out the difference between us. The API proposed by John is mor= e at the execution level, that's where input/output stream coming from, whi= ch assumes that both source and destination object will be operated at the = same place(either inside ssvm, or on hypervisor host). While the API I prop= osed is more about how to hook up vendor's own storage into cloudstack's mg= t server, thus can replace the process on how and where to operate on the s= torage. Let's talk about the execution model at first, which will have huge impact = on the design we made. The execution model is about where to execute operat= ions issued by mgt server. Currently, there is no universal execution model= , it's quite different for each hypervisor. E.g. for KVM, mgt server will send commands to KVM host, there is a java a= gent running on kvm host, which can execute command send by mgt server. For xenserver, most of commands will be executed on mgt server, which will = call xapi, then talking to xenserver host. But we do put some python code = at xenserver host, if there are operations not supported by xapi. For vmware, most of commands will be executed on mgt server, which talking = to vcenter API, while some of them will be executed inside SSVM. Due to the different execution models, we'll get into a problem about how a= nd where to access storage device. For example, there is a storage box, whi= ch has its own management API to be accessed. Now I want to create a volume= on the storage box, where should I call stoage box's create volume api? If= we follow up above execution models, we need to call the api at different = places and even worse, you need to write the API call in different language= s. For kvm, you may need to write java code in kvm agent, for xenserver, yo= u may need to write a xapi python plugin, for vmware, you may need to put t= he java code inside ssvm etc. But if the storage box already has management api, why just call it inside = cloudstack mgt server, then device vendor should just write java code once,= for all the different hypervisors? If we don't enforce the execution model= , then the storage framework should have a hook in management server, devic= e vendor can decide where to execute commands send by mgt server. That's my datastoredriver layer used for. Take taking snapshot diagram as a= n example: https://cwiki.apache.org/confluence/download/attachments/3074156= 9/take+snapshot+sequence.png?version=3D1&modificationDate=3D1358189965000 Datastoredriver is running inside mgt server, while datastoredriver itself = can decide where to execute "takasnapshot" API, driver can send a command t= o hypervisor host, or directly call storage box's API, or directly call hyp= ervisor's own API, or another service running outside of cloudstack mgt ser= ver. It's all up to the implementation of driver. Does it make sense? If it's true, the device driver should not take input/o= ut stream as parameter, as it enforces the execution model, which I don't t= hink it's necessary. BTW, John and I will discuss the matter tomorrow on Skype, if you want to j= oin, please let me know. > -----Original Message----- > From: Edison Su [mailto:Edison.su@citrix.com] > Sent: Monday, January 14, 2013 3:19 PM > To: cloudstack-dev@incubator.apache.org > Subject: RE: new storage framework update > > > > > -----Original Message----- > > From: John Burwell [mailto:jburwell@basho.com] > > Sent: Friday, January 11, 2013 12:30 PM > > To: cloudstack-dev@incubator.apache.org > > Subject: Re: new storage framework update > > > > Edison, > > > > I think we are speaking past each other a bit. My intention is to > > separate logical and physical storage operations in order to simplify > > the implementation of new storage providers. Also, in order to > > support the widest range of storage mechanisms, I want to eliminate > > all interface assumptions (implied and explicit) that a storage device > > supports a file > > I think if the nfs secondary storage is optional, then all the inefficien= t related > to object storage will get away? > > > system. These two issues make implementation of efficient storage > > drivers extremely difficult. For example, for object stores, we have > > to create polling synchronization threads that add complexity, > > overhead, and latency to the system. If we could connect the > > OutputStream of a source (such as an HTTP > > upload) to the InputStream of the object store, transfer operations > > would be far simpler and efficient. The conflation of logical and > > physical operations also increases difficulty and complexity to > > reliably and maintainably implement cross-cutting storage features > > such as at-rest encryption. In my opinion, the current design in > > Javelin makes progress on the first point, but does not address the > > second point. Therefore, I propose that we refine the design to > > explicitly separate logical and physical operations and utilize the > > higher level I/O abstractions provided by the JDK to remove any interfa= ce > requirements for a file-based operations. > > > > Based on these goals, I propose keeping the logical Image, > > ImageMotion, Volume, Template, and Snapshot services. These services > > would be responsible for logical storage operations (.e.g > > createVolumeFromTemplate, downloadTemplate, createSnapshot, > > deleteSnapshot, etc). To perform physical operations, the > > StorageDevice concept would be added with the following operations: > > > > * void read(URI aURI, OutputStream anOutputStream) throws IOException > > * void write(URI aURI, InputStream anInputStream) throws IOException > > * Set list(URI aURI) throws IOException > > * boolean delete(URI aURI) throws IOException > > * StorageDeviceType getType() > > I agree with your simplified interface, but still cautious about the simp= le URI > may not enough. > For example, at the driver level, what about driver developer wants to kn= ow > extra information about the object being operated? > I ended up with new APIs like: > https://cwiki.apache.org/confluence/download/attachments/30741569/prov > ider.jpg?version=3D1&modificationDate=3D1358168083079 > At the driver level, it works on two interfaces: > DataObject, which is the interface of volume/snapshot/template. > DataStore, which is the interface of all the primary storage or image sto= rage. > The API is pretty much looks like you proposed: > grantAccess(DataObject, EndPoint ep): make the object accessible for an > endpoint, and return an URI represent the object. This is used during mov= ing > the object around different storages. For example, in the sequence diagr= am, > create volume from template: > https://cwiki.apache.org/confluence/download/attachments/30741569/crea > tevolumeFromtemplate.png?version=3D1&modificationDate=3D1358172931767, > datamotionstrategy will call grantaccess on both source and destination > datastore, then got two URIs represent the source and destination object, > then send the URIs to endpoint(it can be the agent running side ssvm, or = it > can be a hypervisor host) to conduct the actual copy operation. > Revokeaccess: the opposite of above API. > listObjects(DataStore), list objects on datastore > createAsync(DataObject): create an object on datastore, the driver should= n't > care about what's the object it is, but should only care about the size o= f the > object, the data store of the object, all of these information can be dir= ectly > inferred from DataObject. If the driver needs more information about the > object, driver developer can get the id of the object, query database, th= en > find about more information. And this interface has no assumption about t= he > underneath storage, it can be primary storage, or s3/swift, or a ftp serv= er, or > whatever writable storage. > deleteAsync(DataObject): delete an object on a datastore, the opposite of > createAsync copyAsync(DataObject, DataObject): copy src object to dest > object. It's for storage migration. Some storage vendor or hypervisor has= its > own efficient way to migrate storage from one place to another. Most of t= he > time, the migration across different vendors or different storage > types(primary <=3D> image storage), needs to go to datamotionservice, whi= ch > will be covered later. > canCopy(DataObject, DataObject): it helps datamotionservice to make the > decision on storage migration. > > For primary storage driver, there are extra two APIs: > takeSnapshot(SnapshotInfo snapshot): take snapshot > revertSnapshot(SnapshotInfo snapshot): revert snapshot. > > > > > > This interface does not mirror any that I am aware of the current JDK. > > Instead, it leverages the facilities it provides to abstract I/O > > operations between different types of devices (e.g. reading data from > > a socket and writing to a file or reading data from a socket and writin= g it to > another socket). > > Specifying the input or output stream allows the URI to remain logical > > and device agnostic because the device is being a physical stream from > > which to read or write with it. Therefore, specifying a logical URI > > without the associated stream would require implicit assumptions to be > > made by the StorageDevice and clients regarding data acquisition. To > > perform physical operations, one or more instances of StorageDevice > > would be passed into to the logical service methods to compose into a > > set of physical operations to perform logical operation (e.g. copying > > a template from secondary storage to a volume). > > > I think our difference is only about the parameter of the API is an URI o= r an > Object. > Using an Object instead of a plain URI, using an object maybe more flexib= le, > and the DataObject itself has an API called: getURI, which can translate = the > Object into an URI. See the interface of DataObject: > https://cwiki.apache.org/confluence/download/attachments/30741569/data > +model.jpg?version=3D1&modificationDate=3D1358171015660 > > > > > > StorageDevices are not intended to be content aware. They simply map > > logical URIs to the physical context they represent (a path on a > > filesystem, a bucket and key in an object store, a range of blocks in > > a block store, etc) and perform the requested operation on the > > physical context (i.e. read a byte stream from the physical location > > representing "/template/2/200", delete data represented by > > "/snapshot/3/300", list the contents of the physical location > > represented by "/volume/4/400", etc). In my opinion, it would be a > > misuse of a URI to infer an operation from their content. Instead, > > the VolumeService would expose a method such as the following to > perform the creation of a volume from a template: > > > > createVolumeFromTemplate(Template aTemplate, StorageDevice > > aTemplateDevice, Volume aVolume, StorageDevice aVolumeDevice, > > Hypervisor aHypervisor) > > > > The VolumeService would coordinate the creation of the volume with the > > passed hypervisor and, using the InputStream and OutputStreams > > provided by the devices, coordinate the transfer of data between the > > template storage device and the volume storage device. Ideally, the > > Template and Volume classes would encapsulate the rules for logical > > URI creation in a method. Similarly, the SnapshotService would expose > > the a method such as the following to take a snapshot of a volume: > > > > createSnapshot(Volume aVolume, StorageDevice aSnapshotDevice) > > > > The SnapshotService would request the creation of a snapshot for the > > volume and then request a write of the snapshot data to the > > StorageDevice through the write method. > > I agree, the service has rich apis, while at the driver level, the api sh= ould be as > simple and neutral to the object operated on. > I updated the sequence diagrams: > create volume from template: > https://cwiki.apache.org/confluence/download/attachments/30741569/crea > tevolumeFromtemplate.png?version=3D1&modificationDate=3D1358172931767 > add template into image storage: > https://cwiki.apache.org/confluence/download/attachments/30741569/regi > ster+template+on+image+store.png?version=3D1&modificationDate=3D13581895 > 65551 > take snapshot: > https://cwiki.apache.org/confluence/download/attachments/30741569/take > +snapshot+sequence.png?version=3D1&modificationDate=3D1358189965438 > backup snapshot into image storage: > https://cwiki.apache.org/confluence/download/attachments/30741569/back > up+snapshot+sequence.png?version=3D1&modificationDate=3D1358192407152 > > Could you help to review? > > > > > I hope these explanations clarify both the design and motivation of my > > proposal. I believe it is critical for the project's future > > development that the storage layer efficiently operate with storage > > devices that do not support traditional filesystems (e.g. object > > stores, raw block devices, etc). There are a fair number of these > > types of devices which CloudStack will likely need to support in the > > future. I believe that CloudStack will be well positioned to > > maintainability and efficiently support them if it carefully separates = logical > and physical storage operations. > > Thanks for you feedback, I rewrite the API last weekend based on your > suggestion, and update the wiki: > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsyst > em+2.0 > The code is starting, but not checked into javelin branch yet. > > > > > Thanks, > > -John > > > > On Jan 9, 2013, at 8:10 PM, Edison Su wrote: > > > > > > > > > > >> -----Original Message----- > > >> From: John Burwell [mailto:jburwell@basho.com] > > >> Sent: Tuesday, January 08, 2013 8:51 PM > > >> To: cloudstack-dev@incubator.apache.org > > >> Subject: Re: new storage framework update > > >> > > >> Edison, > > >> > > >> Please see my thoughts in-line below. I apologize for S3-centric > > >> nature of my example in advance -- it happens to be top of mind for > > obvious reasons ... > > >> > > >> Thanks, > > >> -John > > >> > > >> On Jan 8, 2013, at 5:59 PM, Edison Su wrote: > > >> > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: John Burwell [mailto:jburwell@basho.com] > > >>>> Sent: Tuesday, January 08, 2013 10:59 AM > > >>>> To: cloudstack-dev@incubator.apache.org > > >>>> Subject: Re: new storage framework update > > >>>> > > >>>> Edison, > > >>>> > > >>>> In reviewing the javelin, I feel that there is a missing abstracti= on. > > >>>> At the lowest level, storage operations are the storage, > > >>>> retrieval, deletion, and listing of byte arrays stored at a partic= ular URI. > > >>>> In order to implement this concept in the current Javelin branch, > > >>>> 3-5 strategy classes must implemented to perform the following > > >>>> low-level > > >> operations: > > >>>> > > >>>> * open(URI aDestinationURI): OutputStream throws IOException > > >>>> * write(URI aDestinationURI, OutputStream anOutputStream) > throws > > >>>> IOException > > >>>> * list(URI aDestinationURI) : Set throws IOException > > >>>> * delete(URI aDestinationURI) : boolean throws IOException > > >>>> > > >>>> The logic for each of these strategies will be identical which > > >>>> will lead to to the creation of a support class + glue code (i.e. > > >>>> either individual adapter classes > > >> > > >> I realize that I omitted a couple of definitions in my original > > >> email. First, the StorageDevice most likely would be implemented > > >> on a domain object that also contained configuration information > > >> for a resource. For example, the S3Impl class would also implement > > >> StorageDevice. On reflection (and a little pseudo coding), I would > > >> also like to refine my original proposed StorageDevice interface: > > >> > > >> * void read(URI aURI, OutputStream anOutputStream) throws > > IOException > > >> * void write(URI aURI, InputStream anInputStream) throws > IOException > > >> * Set list(URI aURI) throws IOException > > >> * boolean delete(URI aURI) throws IOException > > >> * StorageDeviceType getType() > > >> > > >>> > > >>> If the lowest api is too opaque, like one URI as parameter, I am > > >>> wondering > > >> it may make the implementation more complicated than it sounds. > > >>> For example, there are at least 3 APIs for primary storage driver: > > >> createVolumeFromTemplate, createDataDisk, deleteVolume, and two > > >> snapshot related APIs: createSnapshot, deleteSnapshot. > > >>> How to encode above operations into simple write/delete APIs? If > > >>> one URI > > >> contains too much information, then at the end of day, the receiver > > >> side(the code in hypervisor resource), who is responsible to decode > > >> the URI, is becoming complicated. That's the main reason, I decide > > >> to use more specific APIs instead of one opaque URI. > > >>> That's true, if the API is too specific, people needs to implement > > >>> ton of > > >> APIs(mainly imagedatastoredirver, primarydatastoredriver, > > >> backupdatastoredriver), and all over the place. > > >>> Which one is better? People can jump into discuss. > > >>> > > >> > > >> The URI scheme should be a logical, unique, and reversal values > > >> associated with the type of resource being stored. For example, > > >> the general form of template URIs would > > >> "/template///template.properties" and > > >> "/template///.vhd" . Therefore, for > > >> account id 2, template id 200, the template.properties resource > > >> would be assigned a URI of "/template/2/200/template.properties. > > >> The StorageDevice implementation translates the logical URI to a > > >> physical representation. Using > > >> S3 as an example, the StorageDevice is configured to use bucket > > >> jsb- cloudstack at endpoint s3.amazonaws.com. The S3 storage > > >> device would translate the URI to s3://jsb- > > >> cloudstack/templates/2/200/template.properties. For an NFS storage > > >> device mounted on nfs://localhost/cloudstack, the StorageDevice > > >> would translate the logical URI to > > >> > > hfs://localhost/cloudstack/template///templa > > >> te .properties. In short, I believe that we can devise a simple > > >> scheme that allows the StorageDevice to treat the URI path relative > > >> to its root. > > >> > > >> To my mind, the createVolumeFromTemplate is decomposable into a > > >> series of StorageDevice#read and StorageDevice#write operations > > >> which would be issued by the VolumeManager service such as the > following: > > >> > > >> public void createVolumeFromTemplate(Template aTemplate, > > >> StorageDevice aTemplateDevice, Volume aVolume, StorageDevice > > >> aVolumeDevice) { > > >> > > >> try { > > >> > > >> if (aVolumeDevice.getType() !=3D StorageDeviceType.BLOCK || > > >> aVolumeDevice.getType() !=3D StorageDeviceType.FILE_SYSTEM) { throw > > new > > >> UnsupportedStorageDeviceException(...); > > >> } > > >> > > >> // Pull the template from template device into a temporary > > >> directory final File aTemplateDirectory =3D new File(