incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wido den Hollander <w...@widodh.nl>
Subject Re: new storage framework update
Date Tue, 22 Jan 2013 12:13:39 GMT


On 01/20/2013 11:16 PM, Marcus Sorensen wrote:
>    To me the management server should be the most secure link in the chain.
> It has the keys to the kingdom, full access to the cloud database, can send
> commands to each host, etc. It doesn't make sense to say that we trust the
> management server to destroy/expunge volumes, destroy/expunge VMs,have full
> control to wipe out the entire system, but not to create/remove/grant
> access to volumes on the SAN.
>

There has to be a side-note here. Granting access to a API of a SAN can 
also mean the management server could remove snapshots created by the 
SAN which are used as disaster recovery.

> As such, if you can access the management server publicly, that's a
> security issue. A firewall or proxy doesn't help; if the tomcat server is
> compromised then remote execution could be possible even though a proxy or
> firewall. What really needs to happen here is a (optional) standalone web
> ui that can be installed on a web server. That way if the web server is
> compromised, then the attacker still can only make cloudstack API calls.
> They can perhaps catch user credentials and wreak havoc that way, but they
> can't dump your database or make sweeping changes, no immediate keys to the
> kingdom. They are limited to doing what the API lets them with the
> credentials they have. Not sure how easy it would be to install the
> existing UI on a different server and point it at a management server for
> API access.
>

The UI is nothing more then Javascript running client-side talking to 
the API. The API still has to be open for the world to talk to, if you 
are using the UI or not.

An exploit in the API code could still expose you in some way.

>     The hypervisor/agents should be relatively unprivileged, as they run on
> a public network and run untrusted guests. They are also weak links
> security-wise, and should be in a DMZ-like network. Again, if an attacker
> breaks in, they have access to what is on that particular hypervisor
> (whatever other VMs are there and perhaps a handful of shared volumes), but
> not the ability to destroy the whole system. That is, unless your
> hypervisors/hosts have full authorization to reconfigure your SAN.
>

A hypervisor is indeed connected publicly, but it doesn't mean it has a 
public available IP. There could be a bridge (KVM) over which public 
internet traffic goes from guests, but that doesn't mean you can reach 
the hypervisor.

>    The management server is already the most privileged part of cloudstack.
> I think spreading out the privileges would be a bad idea. If the hypervisor
> host is controlling the SAN, the attacker can do something like delete all
> volumes on the SAN either from the hypervisor, or from the management
> server(by sending agent commands). If the management server is talking to
> the SAN, then a compromised hypervisor has no access to delete/reconfigure
> SAN. It can only see those volumes that the management server has granted
> it via the SAN api. So you broaden your weakness by any privilege you give
> the hypervisor host.
>
>    I get the sense though that Wido's concern is not just about security,
> but about design/architecture by his final comment. I don't quite
> understand it though. The management servers are the orchestrators, and
> zones, pods, clusters are building blocks for the management server to use
> in orchestration. These are all collections of resources that the
> management server makes sense of.

Indeed, it's not ONLY about security. Right now for me a cluster is:
- Hypervisors
- Switch(es)
- SAN

You build that nicely in a rack (pod) and the SAN doesn't have to be 
connected to anything else, it's just connected to that switch.

In the new scenario you have to make the API available to the mgmt 
server, so you have to start running extra VLANs or physical cabling to 
get it to the management server.

Suddenly you have to start interconnecting more switches to give the 
mgmt server Layer 3 access to the SAN's API.

Wido

>
> On Sun, Jan 20, 2013 at 8:33 AM, Wido den Hollander <wido@widodh.nl> wrote:
>
>>
>>
>> On 01/19/2013 12:50 AM, Edison Su wrote:
>>
>>>
>>>
>>>   -----Original Message-----
>>>> From: Wido den Hollander [mailto:wido@widodh.nl]
>>>> Sent: Friday, January 18, 2013 3:26 PM
>>>> To: cloudstack-dev@incubator.**apache.org<cloudstack-dev@incubator.apache.org>
>>>> Subject: Re: new storage framework update
>>>>
>>>>
>>>>
>>>> On 01/18/2013 08:09 PM, Edison Su wrote:
>>>>
>>>>>
>>>>>
>>>>>   -----Original Message-----
>>>>>> From: Wido den Hollander [mailto:wido@widodh.nl]
>>>>>> Sent: Friday, January 18, 2013 12:51 AM
>>>>>> To: cloudstack-dev@incubator.**apache.org<cloudstack-dev@incubator.apache.org>
>>>>>> Subject: Re: new storage framework update
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 01/16/2013 02:35 AM, Edison Su wrote:
>>>>>>
>>>>>>> After a lengthy discussion(more than two hours) with John on Skype,
>>>>>>> I
>>>>>>>
>>>>>> think we figured out the difference between us.  The API proposed by
>>>>>> John is more at the execution level, that's where input/output stream
>>>>>> coming from, which assumes that both source and destination object
>>>>>> will be operated at the same place(either inside ssvm, or on
>>>>>> hypervisor host). While the API I proposed is more about how to hook
>>>>>> up vendor's own storage into cloudstack's mgt server, thus can
>>>>>> replace the process on how and where to operate on the storage.
>>>>>>
>>>>>>> Let's talk about the execution model at first, which will have huge
>>>>>>> impact
>>>>>>>
>>>>>> on the design we made. The execution model is about where to execute
>>>>>> operations issued by mgt server. Currently, there is no universal
>>>>>> execution model, it's quite different for each hypervisor.
>>>>>>
>>>>>>>      E.g. for KVM, mgt server will send commands to KVM host, there is
>>>>>>> a java
>>>>>>>
>>>>>> agent running on kvm host, which can execute command send by mgt
>>>>>>
>>>>> server.
>>>>
>>>>> For xenserver, most of commands will be executed on mgt server,
>>>>>>> which
>>>>>>>
>>>>>> will call xapi, then talking to xenserver host.  But we do put some
>>>>>> python code at xenserver host, if there are operations not supported by
>>>>>>
>>>>> xapi.
>>>>
>>>>> For vmware, most of commands will be executed on mgt server, which
>>>>>>>
>>>>>> talking to vcenter API, while some of them will be executed inside
>>>>>> SSVM.
>>>>>>
>>>>>>> Due to the different execution models, we'll get into a problem
>>>>>>> about how
>>>>>>>
>>>>>> and where to access storage device. For example, there is a storage
>>>>>> box, which has its own management API to be accessed. Now I want to
>>>>>> create a volume on the storage box, where should I call stoage box's
>>>>>> create volume api? If we follow up above execution models, we need to
>>>>>> call the api at different places and even worse, you need to write
>>>>>> the API call in different languages. For kvm, you may need to write
>>>>>> java code in kvm agent, for xenserver, you may need to write a xapi
>>>>>> python plugin, for vmware, you may need to put the java code inside
>>>>>>
>>>>> ssvm  etc.
>>>>
>>>>> But if the storage box already has management api, why just call it
>>>>>>> inside
>>>>>>>
>>>>>> cloudstack mgt server, then device vendor should just write java code
>>>>>> once, for all the different hypervisors? If we don't enforce the
>>>>>> execution model, then the storage framework should have a hook in
>>>>>> management server, device vendor can decide where to execute
>>>>>>
>>>>> commands send by mgt server.
>>>>
>>>>>
>>>>>> With this you are assuming that the management server always has
>>>>>> access to the API of the storage box?
>>>>>>
>>>>>> What if the management server is in network X (say Amsterdam) en I
>>>>>> have a zone in London where my storage box X is in a private network.
>>>>>>
>>>>>> The only one that can access the API then is the hypervisor, so the
>>>>>> calls have to go through there.
>>>>>>
>>>>>> I don't want to encourage people to write "stupid code" where they
>>>>>> assume that the management server is this thing which is tied up into
>>>>>>
>>>>> every network.
>>>>
>>>>>
>>>>> I think we will change the current mgt server deployment model to
>>>>> cluster of mgt servers per zone, instead of a cluster of mgt servers
>>>>> manage the whole zones:
>>>>> https://cwiki.apache.org/**confluence/display/CLOUDSTACK/**AWS-<https://cwiki.apache.org/confluence/display/CLOUDSTACK/AWS->
>>>>>
>>>> Style+Regio
>>>>
>>>>> ns If above works, then mgt server can assume it can access storage
>>>>> box's API. BTW, the mgt server does need to access some private mgt API,
>>>>>
>>>> such as F5/netscaler etc.
>>>>
>>>> Imho that would be a big security flaw. The mgmt server has to be
>>>> publically
>>>> available for external users and normally you don't want public machines
>>>> to
>>>> be able to reach the API of your storage directly.
>>>>
>>>> I know that in the current model it already works this way for some
>>>> hardware
>>>> loadbalancers, but I would think it would not be wise to assume that a
>>>> management server would have access to all storage API's in that zone.
>>>>
>>>> A cluster should still be that bunch of nodes, on their switches with
>>>> some
>>>> storage. The management server only communicates with the hypervisor
>>>> which on his turn can talk to the storage.
>>>>
>>>
>>> If mgt server can access hypervisor host directly, then means if mgt
>>> server is breached by hackers, then so for your hypervisor hosts(to me,
>>> this is more urgent than breaking storage box)
>>> Better to not expose mgt server to public domain directly, adding load
>>> balancer, proxy or firewall in front of mgt server.
>>>
>>>
>> A break-in on the management server doesn't explicitly mean that you can
>> also break into the Hypervisor.
>>
>> I personally don't like the management server talking to the storage API
>> directly, imho that breaks the way if things are supposed to work.
>>
>> I'm just afraid that you'll get all kinds of ties running through the
>> network where a zone is just a collection of hardware. To me that would
>> diminish the separation between a zone, pod and cluster.
>>
>> Wido
>>
>>
>>>> Otherwise you are going to be pulling lines (VLANs) through your whole
>>>> zone
>>>> just to make every storage API available to the management server.
>>>>
>>>> That is not only a potential security risk to me, it also makes networks
>>>> much
>>>> more complex.
>>>>
>>>> I wouldn't recommend always assuming that the management server can
>>>> directly reach all of those things over Layer3.
>>>>
>>>> Wido
>>>>
>>>>
>>>>>> Wido
>>>>>>
>>>>>>   That's my datastoredriver layer used for. Take taking snapshot
>>>>>>> diagram as an example:
>>>>>>>
>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/tak<https://cwiki.apache.org/confluence/download/attachments/30741569/tak>
>>>>
>>>>> e
>>>>>>
>>>>>>> +snapshot+sequence.png?**version=1&modificationDate=**1358189965000
>>>>>>> Datastoredriver is running inside mgt server, while datastoredriver
>>>>>>> itself
>>>>>>>
>>>>>> can decide where to execute "takasnapshot" API, driver can send a
>>>>>> command to hypervisor host, or directly call storage box's API, or
>>>>>> directly call hypervisor's own API, or another service running
>>>>>> outside of cloudstack mgt server. It's all up to the implementation of
>>>>>>
>>>>> driver.
>>>>
>>>>> Does it make sense? If it's true, the device driver should not take
>>>>>>> input/out
>>>>>>>
>>>>>> stream as parameter, as it enforces the execution model, which I
>>>>>> don't think it's necessary.
>>>>>>
>>>>>>> BTW, John and I will discuss the matter tomorrow on Skype, if you
>>>>>>> want to
>>>>>>>
>>>>>> join, please let me know.
>>>>>>
>>>>>>>
>>>>>>>   -----Original Message-----
>>>>>>>> From: Edison Su [mailto:Edison.su@citrix.com]
>>>>>>>> Sent: Monday, January 14, 2013 3:19 PM
>>>>>>>> To: cloudstack-dev@incubator.**apache.org<cloudstack-dev@incubator.apache.org>
>>>>>>>> Subject: RE: new storage framework update
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   -----Original Message-----
>>>>>>>>> From: John Burwell [mailto:jburwell@basho.com]
>>>>>>>>> Sent: Friday, January 11, 2013 12:30 PM
>>>>>>>>> To: cloudstack-dev@incubator.**apache.org<cloudstack-dev@incubator.apache.org>
>>>>>>>>> Subject: Re: new storage framework update
>>>>>>>>>
>>>>>>>>> Edison,
>>>>>>>>>
>>>>>>>>> I think we are speaking past each other a bit.  My intention is to
>>>>>>>>> separate logical and physical storage operations in order to
>>>>>>>>> simplify the implementation of new storage providers.  Also, in
>>>>>>>>> order to support the widest range of storage mechanisms, I want to
>>>>>>>>> eliminate all interface assumptions (implied and explicit) that a
>>>>>>>>> storage device supports a file
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think if the nfs secondary storage is optional, then all the
>>>>>>>> inefficient related to object storage will get away?
>>>>>>>>
>>>>>>>>   system.  These two issues make implementation of efficient
>>>>>>>>> storage drivers extremely difficult.  For example, for object
>>>>>>>>> stores, we have to create polling synchronization threads that add
>>>>>>>>> complexity, overhead, and latency to the system.  If we could
>>>>>>>>> connect the OutputStream of a source (such as an HTTP
>>>>>>>>> upload) to the InputStream of the object store, transfer
>>>>>>>>> operations would be far simpler and efficient.  The conflation of
>>>>>>>>> logical and physical operations also increases difficulty and
>>>>>>>>> complexity to reliably and maintainably implement cross-cutting
>>>>>>>>> storage features such as at-rest encryption.  In my opinion, the
>>>>>>>>> current design in Javelin makes progress on the first point, but
>>>>>>>>> does not address the second point.  Therefore, I propose that we
>>>>>>>>> refine the design to explicitly separate logical and physical
>>>>>>>>> operations and utilize the higher level I/O abstractions provided
>>>>>>>>> by the JDK to remove any interface
>>>>>>>>>
>>>>>>>> requirements for a file-based operations.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Based on these goals, I propose keeping the logical Image,
>>>>>>>>> ImageMotion, Volume, Template, and Snapshot services.  These
>>>>>>>>> services would be responsible for logical storage operations (.e.g
>>>>>>>>> createVolumeFromTemplate, downloadTemplate, createSnapshot,
>>>>>>>>> deleteSnapshot, etc).  To perform physical operations,  the
>>>>>>>>> StorageDevice concept would be added with the following operations:
>>>>>>>>>
>>>>>>>>> * void read(URI aURI, OutputStream anOutputStream) throws
>>>>>>>>> IOException
>>>>>>>>> * void write(URI aURI, InputStream anInputStream)  throws
>>>>>>>>> IOException
>>>>>>>>> * Set<URI> list(URI aURI)  throws IOException
>>>>>>>>> * boolean delete(URI aURI) throws IOException
>>>>>>>>> * StorageDeviceType getType()
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with your simplified interface, but still cautious about
>>>>>>>> the simple URI may not enough.
>>>>>>>> For example, at the driver level, what about driver developer wants
>>>>>>>> to know extra information about the object being operated?
>>>>>>>> I ended up with new APIs like:
>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/pro<https://cwiki.apache.org/confluence/download/attachments/30741569/pro>
>>>>
>>>>> v
>>>>>>>> ider.jpg?version=1&**modificationDate=1358168083079
>>>>>>>>      At the driver level, it works on two interfaces:
>>>>>>>>      DataObject, which is the interface of volume/snapshot/template.
>>>>>>>> DataStore, which is the interface of all the primary storage or
>>>>>>>> image
>>>>>>>>
>>>>>>> storage.
>>>>>>
>>>>>>> The API is pretty much looks like you proposed:
>>>>>>>> grantAccess(DataObject, EndPoint ep): make the object accessible
>>>>>>>> for an endpoint, and return an URI represent the object. This is
>>>>>>>> used during moving the object around different storages.  For
>>>>>>>> example, in the sequence diagram, create volume from template:
>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/cre<https://cwiki.apache.org/confluence/download/attachments/30741569/cre>
>>>>
>>>>> a
>>>>>>>>
>>>>>>>
>>>>>>   tevolumeFromtemplate.png?**version=1&modificationDate=**
>>>> 1358172931767,
>>>>
>>>>> datamotionstrategy will call grantaccess on both source and
>>>>>>>> destination datastore, then got two URIs represent the source and
>>>>>>>> destination object, then send the URIs to endpoint(it can be the
>>>>>>>> agent running side ssvm, or it can be a hypervisor host) to conduct
>>>>>>>> the
>>>>>>>>
>>>>>>> actual copy operation.
>>>>>>
>>>>>>> Revokeaccess: the opposite of above API.
>>>>>>>> listObjects(DataStore), list objects on datastore
>>>>>>>> createAsync(DataObject): create an object on datastore, the driver
>>>>>>>> shouldn't care about what's the object it is, but should only care
>>>>>>>> about the size of the object, the data store of the object, all of
>>>>>>>> these information can be directly inferred from DataObject. If the
>>>>>>>> driver needs more information about the object, driver developer
>>>>>>>> can get the id of the object, query database, then find about more
>>>>>>>> information. And this interface has no assumption about the
>>>>>>>> underneath storage, it can be primary storage, or s3/swift, or a
>>>>>>>> ftp server,
>>>>>>>>
>>>>>>> or whatever writable storage.
>>>>>>
>>>>>>> deleteAsync(DataObject): delete an object on a datastore, the
>>>>>>>> opposite of createAsync copyAsync(DataObject, DataObject): copy src
>>>>>>>> object to dest object. It's for storage migration. Some storage
>>>>>>>> vendor or hypervisor has its own efficient way to migrate storage
>>>>>>>> from one place to another. Most of the time, the migration across
>>>>>>>> different vendors or different storage types(primary <=> image
>>>>>>>> storage), needs to go to datamotionservice, which will be covered
>>>>>>>> later.
>>>>>>>> canCopy(DataObject, DataObject): it helps datamotionservice to make
>>>>>>>> the decision on storage migration.
>>>>>>>>
>>>>>>>> For primary storage driver, there are extra two APIs:
>>>>>>>> takeSnapshot(SnapshotInfo snapshot): take snapshot
>>>>>>>> revertSnapshot(SnapshotInfo snapshot): revert snapshot.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> This interface does not mirror any that I am aware of the current
>>>>>>>>> JDK.
>>>>>>>>> Instead, it leverages the facilities it provides to abstract I/O
>>>>>>>>> operations between different types of devices (e.g. reading data
>>>>>>>>> from a socket and writing to a file or reading data from a socket
>>>>>>>>> and writing it to
>>>>>>>>>
>>>>>>>> another socket).
>>>>>>>>
>>>>>>>>> Specifying the input or output stream allows the URI to remain
>>>>>>>>> logical and device agnostic because the device is being a physical
>>>>>>>>> stream from which to read or write with it.  Therefore, specifying
>>>>>>>>> a logical URI without the associated stream would require implicit
>>>>>>>>> assumptions to be made by the StorageDevice and clients regarding
>>>>>>>>> data acquisition.  To perform physical operations, one or more
>>>>>>>>> instances of StorageDevice would be passed into to the logical
>>>>>>>>> service methods to compose into a set of physical operations to
>>>>>>>>> perform logical operation (e.g. copying a template from secondary
>>>>>>>>>
>>>>>>>> storage to a volume).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I think our difference is only about the parameter of the API is an
>>>>>>>> URI or an Object.
>>>>>>>> Using an Object instead of a plain URI, using an object maybe more
>>>>>>>> flexible, and the DataObject itself has an API called: getURI,
>>>>>>>> which can translate the Object into an URI. See the interface of
>>>>>>>>
>>>>>>> DataObject:
>>>>
>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/dat<https://cwiki.apache.org/confluence/download/attachments/30741569/dat>
>>>>
>>>>> a
>>>>>>>> +model.jpg?version=1&**modificationDate=1358171015660
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> StorageDevices are not intended to be content aware.  They simply
>>>>>>>>> map logical URIs to the physical context they represent (a path on
>>>>>>>>> a filesystem, a bucket and key in an object store, a range of
>>>>>>>>> blocks in a block store, etc) and perform the requested operation
>>>>>>>>> on the physical context (i.e. read a byte stream from the physical
>>>>>>>>> location representing "/template/2/200", delete data represented
>>>>>>>>> by "/snapshot/3/300", list the contents of the physical location
>>>>>>>>> represented by "/volume/4/400", etc).  In my opinion, it would be
>>>>>>>>> a misuse of a URI to infer an operation from their content.
>>>>>>>>> Instead, the VolumeService would expose a method such as the
>>>>>>>>> following to
>>>>>>>>>
>>>>>>>> perform the creation of a volume from a template:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> createVolumeFromTemplate(**Template aTemplate, StorageDevice
>>>>>>>>> aTemplateDevice, Volume aVolume, StorageDevice aVolumeDevice,
>>>>>>>>> Hypervisor aHypervisor)
>>>>>>>>>
>>>>>>>>> The VolumeService would coordinate the creation of the volume with
>>>>>>>>> the passed hypervisor and, using the InputStream and OutputStreams
>>>>>>>>> provided by the devices, coordinate the transfer of data between
>>>>>>>>> the template storage device and the volume storage device.
>>>>>>>>> Ideally, the Template and Volume classes would encapsulate the
>>>>>>>>> rules for logical URI creation in a method.  Similarly, the
>>>>>>>>> SnapshotService would expose the a method such as the following to
>>>>>>>>> take a snapshot of a
>>>>>>>>>
>>>>>>>> volume:
>>>>>>
>>>>>>>
>>>>>>>>> createSnapshot(Volume aVolume, StorageDevice aSnapshotDevice)
>>>>>>>>>
>>>>>>>>> The SnapshotService would request the creation of a snapshot for
>>>>>>>>> the volume and then request a write of the snapshot data to the
>>>>>>>>> StorageDevice through the write method.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree, the service has rich apis, while at the driver level, the
>>>>>>>> api should be as simple and neutral to the object operated on.
>>>>>>>> I updated the sequence diagrams:
>>>>>>>> create volume from template:
>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/cre<https://cwiki.apache.org/confluence/download/attachments/30741569/cre>
>>>>
>>>>> a
>>>>>>>>
>>>>>>>>
>>>>>>   tevolumeFromtemplate.png?**version=1&modificationDate=**1358172931767
>>>>
>>>>> add template into image storage:
>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/reg<https://cwiki.apache.org/confluence/download/attachments/30741569/reg>
>>>>
>>>>> i
>>>>>>>>
>>>>>>>>
>>>>>>   ster+template+on+image+store.**png?version=1&**
>>>> modificationDate=13581895
>>>>
>>>>> 65551
>>>>>>>> take snapshot:
>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/tak<https://cwiki.apache.org/confluence/download/attachments/30741569/tak>
>>>>
>>>>> e
>>>>>>>>
>>>>>>>>   +snapshot+sequence.png?**version=1&modificationDate=**1358189965438
>>>>
>>>>> backup snapshot into image storage:
>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/bac<https://cwiki.apache.org/confluence/download/attachments/30741569/bac>
>>>>
>>>>> k
>>>>>>>>
>>>>>>>>
>>>>>>   up+snapshot+sequence.png?**version=1&modificationDate=**1358192407152
>>>>
>>>>>
>>>>>>>> Could you help to review?
>>>>>>>>
>>>>>>>>
>>>>>>>>> I hope these explanations clarify both the design and motivation
>>>>>>>>> of my proposal.  I believe it is critical for the project's future
>>>>>>>>> development that the storage layer efficiently operate with
>>>>>>>>> storage devices that do not support traditional filesystems (e.g.
>>>>>>>>> object stores, raw block devices, etc).  There are a fair number
>>>>>>>>> of these types of devices which CloudStack will likely need to
>>>>>>>>> support in the future.  I believe that CloudStack will be well
>>>>>>>>> positioned to maintainability and efficiently support them if it
>>>>>>>>> carefully separates logical
>>>>>>>>>
>>>>>>>> and physical storage operations.
>>>>>>>>
>>>>>>>> Thanks for you feedback, I rewrite the API last weekend based on
>>>>>>>> your suggestion, and update the wiki:
>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/display/CLOUDSTACK/**
>>>> Storage+subsys<https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsys>
>>>>
>>>>> t
>>>>>>>> em+2.0
>>>>>>>> The code is starting, but not checked into javelin branch yet.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>>
>>>>>>>>> On Jan 9, 2013, at 8:10 PM, Edison Su <Edison.su@citrix.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   -----Original Message-----
>>>>>>>>>>> From: John Burwell [mailto:jburwell@basho.com]
>>>>>>>>>>> Sent: Tuesday, January 08, 2013 8:51 PM
>>>>>>>>>>> To: cloudstack-dev@incubator.**apache.org<cloudstack-dev@incubator.apache.org>
>>>>>>>>>>> Subject: Re: new storage framework update
>>>>>>>>>>>
>>>>>>>>>>> Edison,
>>>>>>>>>>>
>>>>>>>>>>> Please see my thoughts in-line below.  I apologize for
>>>>>>>>>>> S3-centric nature of my example in advance -- it happens to be
>>>>>>>>>>> top of mind for
>>>>>>>>>>>
>>>>>>>>>> obvious reasons ...
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>> On Jan 8, 2013, at 5:59 PM, Edison Su <Edison.su@citrix.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>   -----Original Message-----
>>>>>>>>>>>>> From: John Burwell [mailto:jburwell@basho.com]
>>>>>>>>>>>>> Sent: Tuesday, January 08, 2013 10:59 AM
>>>>>>>>>>>>> To: cloudstack-dev@incubator.**apache.org<cloudstack-dev@incubator.apache.org>
>>>>>>>>>>>>> Subject: Re: new storage framework update
>>>>>>>>>>>>>
>>>>>>>>>>>>> Edison,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In reviewing the javelin, I feel that there is a missing
>>>>>>>>>>>>> abstraction.
>>>>>>>>>>>>> At the lowest level, storage operations are the storage,
>>>>>>>>>>>>> retrieval, deletion, and listing of byte arrays stored at a
>>>>>>>>>>>>> particular
>>>>>>>>>>>>>
>>>>>>>>>>>> URI.
>>>>>>
>>>>>>> In order to implement this concept in the current Javelin
>>>>>>>>>>>>> branch,
>>>>>>>>>>>>> 3-5 strategy classes must implemented to perform the following
>>>>>>>>>>>>> low-level
>>>>>>>>>>>>>
>>>>>>>>>>>> operations:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>      * open(URI aDestinationURI): OutputStream throws
>>>>>>>>>>>>>
>>>>>>>>>>>> IOException
>>>>
>>>>>      * write(URI aDestinationURI, OutputStream anOutputStream)
>>>>>>>>>>>>>
>>>>>>>>>>>> throws
>>>>>>>>
>>>>>>>>> IOException
>>>>>>>>>>>>>      * list(URI aDestinationURI) : Set<URI> throws IOException
>>>>>>>>>>>>>      * delete(URI aDestinationURI) : boolean throws IOException
>>>>>>>>>>>>>
>>>>>>>>>>>>> The logic for each of these strategies will be identical which
>>>>>>>>>>>>> will lead to to the creation of a support class + glue code
>>>>>>>>>>>>> (i.e.
>>>>>>>>>>>>> either individual adapter classes
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> I realize that I omitted a couple of definitions in my original
>>>>>>>>>>> email.  First, the StorageDevice most likely would be
>>>>>>>>>>> implemented on a domain object that also contained configuration
>>>>>>>>>>> information for a resource.  For example, the S3Impl class would
>>>>>>>>>>> also implement StorageDevice.  On reflection (and a little
>>>>>>>>>>> pseudo coding), I would also like to refine my original proposed
>>>>>>>>>>> StorageDevice
>>>>>>>>>>>
>>>>>>>>>> interface:
>>>>>>
>>>>>>>
>>>>>>>>>>>       * void read(URI aURI, OutputStream anOutputStream) throws
>>>>>>>>>>>
>>>>>>>>>> IOException
>>>>>>>>>
>>>>>>>>>>       * void write(URI aURI, InputStream anInputStream)  throws
>>>>>>>>>>>
>>>>>>>>>> IOException
>>>>>>>>
>>>>>>>>>       * Set<URI> list(URI aURI)  throws IOException
>>>>>>>>>>>       * boolean delete(URI aURI) throws IOException
>>>>>>>>>>>       * StorageDeviceType getType()
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> If the lowest api is too opaque, like one URI as parameter,  I
>>>>>>>>>>>> am wondering
>>>>>>>>>>>>
>>>>>>>>>>> it may make the implementation more complicated than it sounds.
>>>>>>>>>>>
>>>>>>>>>>>> For example, there are at least 3 APIs for primary storage
>>>>>>>>>>>> driver:
>>>>>>>>>>>>
>>>>>>>>>>> createVolumeFromTemplate, createDataDisk, deleteVolume, and
>>>>>>>>>>>
>>>>>>>>>> two
>>>>>>
>>>>>>> snapshot related APIs: createSnapshot, deleteSnapshot.
>>>>>>>>>>>
>>>>>>>>>>>> How to encode above operations into simple write/delete APIs?
>>>>>>>>>>>> If one URI
>>>>>>>>>>>>
>>>>>>>>>>> contains too much information, then at the end of day, the
>>>>>>>>>>> receiver side(the code in hypervisor resource), who is
>>>>>>>>>>> responsible to decode the URI, is becoming complicated.  That's
>>>>>>>>>>> the main reason, I decide to use more specific APIs instead of one
>>>>>>>>>>>
>>>>>>>>>> opaque URI.
>>>>
>>>>> That's true, if the API is too specific, people needs to
>>>>>>>>>>>> implement ton of
>>>>>>>>>>>>
>>>>>>>>>>> APIs(mainly imagedatastoredirver, primarydatastoredriver,
>>>>>>>>>>> backupdatastoredriver), and all over the place.
>>>>>>>>>>>
>>>>>>>>>>>> Which one is better? People can jump into discuss.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> The URI scheme should be a logical, unique, and reversal values
>>>>>>>>>>> associated with the type of resource being stored.  For example,
>>>>>>>>>>> the general form of template URIs would
>>>>>>>>>>> "/template/<account_id>/<**template_id>/template.**properties"
>>>>>>>>>>>
>>>>>>>>>> and
>>>>
>>>>> "/template/<account_id>/<**template_id>/<uuid>.vhd" .  Therefore,
>>>>>>>>>>> for account id 2, template id 200, the template.properties
>>>>>>>>>>> resource would be assigned a URI of
>>>>>>>>>>>
>>>>>>>>>> "/template/2/200/template.**properties.
>>>>>>
>>>>>>> The StorageDevice implementation translates the logical URI to a
>>>>>>>>>>> physical representation.  Using
>>>>>>>>>>> S3 as an example, the StorageDevice is configured to use bucket
>>>>>>>>>>> jsb- cloudstack at endpoint s3.amazonaws.com.  The S3 storage
>>>>>>>>>>> device would translate the URI to s3://jsb-
>>>>>>>>>>> cloudstack/templates/2/200/**template.properties.  For an NFS
>>>>>>>>>>> storage device mounted on nfs://localhost/cloudstack, the
>>>>>>>>>>> StorageDevice would translate the logical URI to
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>   hfs://localhost/cloudstack/**template/<account_id>/<**
>>>>>> template_id>/templ
>>>>>>
>>>>>>> a
>>>>>>>>>
>>>>>>>>>> te .properties.  In short, I believe that we can devise a simple
>>>>>>>>>>> scheme that allows the StorageDevice to treat the URI path
>>>>>>>>>>> relative to its root.
>>>>>>>>>>>
>>>>>>>>>>> To my mind, the createVolumeFromTemplate is decomposable into
>>>>>>>>>>>
>>>>>>>>>> a
>>>>
>>>>> series of StorageDevice#read and StorageDevice#write operations
>>>>>>>>>>> which would be issued by the VolumeManager service such as the
>>>>>>>>>>>
>>>>>>>>>> following:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> public void createVolumeFromTemplate(**Template aTemplate,
>>>>>>>>>>> StorageDevice aTemplateDevice, Volume aVolume, StorageDevice
>>>>>>>>>>> aVolumeDevice) {
>>>>>>>>>>>
>>>>>>>>>>> try {
>>>>>>>>>>>
>>>>>>>>>>> if (aVolumeDevice.getType() != StorageDeviceType.BLOCK ||
>>>>>>>>>>> aVolumeDevice.getType() != StorageDeviceType.FILE_SYSTEM)
>>>>>>>>>>>
>>>>>>>>>> { throw
>>>>>>
>>>>>>> new
>>>>>>>>>
>>>>>>>>>> UnsupportedStorageDeviceExcept**ion(...);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> // Pull the template from template device into a temporary
>>>>>>>>>>> directory final File aTemplateDirectory = new File(<template
>>>>>>>>>>> temp
>>>>>>>>>>> path>)
>>>>>>>>>>>
>>>>>>>>>>> // Non-DRY -- likely a candidate for a
>>>>>>>>>>> TemplateService#**downloadTemplate method
>>>>>>>>>>>
>>>>>>>>>> aTemplateDevice.read(new
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>   URI("/templates/<account_id>/<**template_id>/template.**
>>>>>> properties"),
>>>>>>
>>>>>>> new
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>   FileOutStream(**aTemplateDirectory.**createFille("template.**
>>>> properties"
>>>>
>>>>> )
>>>>>>>>>>> );
>>>>>>>>>>> aTemplate.read(new
>>>>>>>>>>>
>>>>>>>>>>>   URI("/templates/<account_id>/<**template_id>/<template_uuid>.**
>>>>>> vhd"),
>>>>>>
>>>>>>> new
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>   FileOutputStream(**aTemplateDirectory.createFile(**
>>>> "<template_uuid>.vhd
>>>>
>>>>> ")
>>>>>>>>>>> ;
>>>>>>>>>>>
>>>>>>>>>>> // Perform operations with hypervisor as necessary to register
>>>>>>>>>>> storage which yields // anInputStream (possibly a
>>>>>>>>>>> List<InputStream>)
>>>>>>>>>>>
>>>>>>>>>>> aVolumeDevice.write(new
>>>>>>>>>>>
>>>>>>>>>> URI("/volume/<account_id>/<**volume_id>",
>>>>>>
>>>>>>> anInputStream);
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Not sure we really need the API looks like java IO, but I can see
>>>>>>>>>> the value of using URI to encode
>>>>>>>>>>
>>>>>>>>> objects(volume/snapshot/**template
>>>>
>>>>> etc):
>>>>>>
>>>>>>> driver layer API will be very simple, and can be shared by
>>>>>>>>> multiple components(volume/image services etc) Currently, there is
>>>>>>>>> one datastore object for each storage, the datastore object mainly
>>>>>>>>> used by cloudstack mgt server, to read/write database, and to
>>>>>>>>> maintain the state of each
>>>>>>>>> object(volume/snapshot/**template) in the datastore. And the
>>>>>>>>> datastore object also provides interface for lifecycle management,
>>>>>>>>> and a transformer(which can transform a db object into a *TO, or an
>>>>>>>>>
>>>>>>>> URI).
>>>>
>>>>> The purpose of datastore object is that, I want to offload a lot
>>>>>>>>> of logic from volume/template manager into each object, as the
>>>>>>>>> manager is a singleton, which is not easy to be extended.
>>>>>>>>>
>>>>>>>>>> The relationship between these classes are:
>>>>>>>>>> For volume service: Volumeserviceimpl -> primarydatastore ->
>>>>>>>>>> primarydatastoredriver For image service: imageServiceImpl ->
>>>>>>>>>> imagedataStore -> imagedataStoredriver For snapshot service:
>>>>>>>>>>
>>>>>>>>> snapshotServiceImpl -> {primarydataStore/**imagedataStore} - >
>>>>>>>>> {primarydatastoredriver/**imagedatastoredriver}, the snapshot can
>>>>>>>>> be
>>>>>>>>> on both primarydatastore and imagedatastore.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The current driver API is not good enough, it's too specific for
>>>>>>>>>> each
>>>>>>>>>>
>>>>>>>>> object.
>>>>>>
>>>>>>> For example, there will be an API called createsnapshot in
>>>>>>>>> primarydatastoredriver, and an API called moveSnapshot in
>>>>>>>>> imagedataStoredriver(in order to implement moving snapshot from
>>>>>>>>> primary storage to image store ), also may have an API called,
>>>>>>>>> createVolume in primarydatastoredriver, and an API called
>>>>>>>>> moveVolume in imagedatastoredriver(in order to implement moving
>>>>>>>>> volume from primary to image store). The more objects we add, the
>>>>>>>>> driver API will be
>>>>>>>>>
>>>>>>>> bloated.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> If driver API is using the model you suggested, the simple
>>>>>>>>>>
>>>>>>>>> read/write/delete with URI, for example:
>>>>>>>>>
>>>>>>>>>> void Create(URI uri) throws IOException void copy(URI desturi,
>>>>>>>>>> URI
>>>>>>>>>> srcUri) throws IOException boolean delete(URI uri) throws
>>>>>>>>>> IOException set<URI> list(URI uri) throws IOException
>>>>>>>>>>
>>>>>>>>>> create API has multiple means under different context: if the URI
>>>>>>>>>> has
>>>>>>>>>>
>>>>>>>>> "*/volume/*" means creating volume, if URI has "*/template" means
>>>>>>>>> creating template, and so on.
>>>>>>>>>
>>>>>>>>>> The same for copy API:
>>>>>>>>>> if both destUri and srcUri is volume, it can have different
>>>>>>>>>> meanings, if both
>>>>>>>>>>
>>>>>>>>> volumes are in the same storage, means create a volume a from a
>>>>>>>>> base volume. If both are in the different storages, means volume
>>>>>>>>>
>>>>>>>> migration.
>>>>
>>>>> If destUri is a volume, while the srcUri is a template, means,
>>>>>>>>>> create a
>>>>>>>>>>
>>>>>>>>> volume from template.
>>>>>>>>>
>>>>>>>>>> If destUri is a volume, srcUri is a snapshot and on the same
>>>>>>>>>> storage, means revert snapshot If destUri is a volume, srcUri is
>>>>>>>>>> a snapshot, but on
>>>>>>>>>>
>>>>>>>>> the different storages, means create volume from snapshot.
>>>>>>>>>
>>>>>>>>>> If destUri is a snapshot, srcUri is a volume, means create
>>>>>>>>>> snapshot from
>>>>>>>>>>
>>>>>>>>> volume.
>>>>>>>>>
>>>>>>>>>> If destUri is a snapshot, srcUri is a snapshot, but on the
>>>>>>>>>> different places,
>>>>>>>>>>
>>>>>>>>> means snapshot backup.
>>>>>>>>>
>>>>>>>>>> If destUri is a template, srcUri is a snapshot, means create
>>>>>>>>>> template from
>>>>>>>>>>
>>>>>>>>> snapshot.
>>>>>>>>>
>>>>>>>>>> As you can see, the API is too opaque, needs a complicated logic
>>>>>>>>>> to encode
>>>>>>>>>>
>>>>>>>>> and decode the URIs.
>>>>>>>>>
>>>>>>>>>> Are you OK with above API?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> } catch (IOException e) {
>>>>>>>>>>>
>>>>>>>>>>>          // Log and handle the error ...
>>>>>>>>>>>
>>>>>>>>>>> } finally {
>>>>>>>>>>>
>>>>>>>>>>>          // Close resources ...
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Dependent on the capabilities of the hypervisor's Java API, the
>>>>>>>>>>> temporary files may not be required, and an OutputStream could
>>>>>>>>>>> copied directly to an InputStream.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>   or a class that implements a ton of interfaces).  In addition
>>>>>>>>>>>>> to this added complexity, this segmented approach prevents the
>>>>>>>>>>>>>
>>>>>>>>>>>> implementation
>>>>>>>>>>>
>>>>>>>>>>>> of common, logical storage features such as ACL enforcement
>>>>>>>>>>>>> and asset
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is a good question, how to share the code across multiple
>>>>>>>>>>>>
>>>>>>>>>>> components.
>>>>>>>>>
>>>>>>>>>> For example, one storage can be used as both primary storage and
>>>>>>>>>>> backup storage. In the current code, developer needs to
>>>>>>>>>>> implement both primarydataStoredriver and
>>>>>>>>>>>
>>>>>>>>>> backupdatastoredriver,
>>>>
>>>>> in order to share code between these two drivers if needed, I
>>>>>>>>>>> think developer can write one driver which implements both
>>>>>>>>>>>
>>>>>>>>>> interfaces.
>>>>
>>>>>
>>>>>>>>>>> In my opinion, storage drivers classifying their usage limits
>>>>>>>>>>> functionality and composability.  Hence, my thought is that the
>>>>>>>>>>> StorageDevice should describe its capabilities -- allowing the
>>>>>>>>>>> various services (e.g. Image, Template, Volume,
>>>>>>>>>>> etc) to determine whether or not the passed storage devices can
>>>>>>>>>>> support the requested operation.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>   encryption.  With a common representation of a StorageDevice
>>>>>>>>>>>>> that operates on the standard Java I/O model, we can layer in
>>>>>>>>>>>>> cross-cutting storage operations in a consistent manner.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I agree that nice to have a standard device model, like the
>>>>>>>>>>>> POSIX file
>>>>>>>>>>>>
>>>>>>>>>>> system API in Unix world. But I haven't figure out how to
>>>>>>>>>>> generalized all the operations on the storage, as I mentioned
>>>>>>>>>>> above.
>>>>>>>>>>>
>>>>>>>>>>>> I can think about, createvolumefromtemplate, can be generalized
>>>>>>>>>>>> as link
>>>>>>>>>>>>
>>>>>>>>>>> api, but how about taking snapshot? How about who will handle
>>>>>>>>>>> the difference between delete voume and  delete snapshot, if
>>>>>>>>>>> they are using the same delete API?
>>>>>>>>>>>
>>>>>>>>>>> The following is an snippet that would be part of the
>>>>>>>>>>> SnapshotService to take a snapshot:
>>>>>>>>>>>
>>>>>>>>>>>          // Ask the hypervisor to take a snapshot yields
>>>>>>>>>>> anInputStream
>>>>>>>>>>>
>>>>>>>>>> (e.g.
>>>>
>>>>> FileInputStream)
>>>>>>>>>>>
>>>>>>>>>>>          aSnapshotDevice.write(new
>>>>>>>>>>> URI("/snapshots/<account_id>/<**snapshot_id>), anInputStream)
>>>>>>>>>>>
>>>>>>>>>>> Ultimately, a snapshot can be exported to a single file or
>>>>>>>>>>> OutputStream which can written back out to a StorageDevice.  For
>>>>>>>>>>> deleting a snapshot, the following snippet would perform the
>>>>>>>>>>> deletion in
>>>>>>>>>>>
>>>>>>>>>> the SnapshotService:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>          // Ask the hypervisor to delete the snapshot ...
>>>>>>>>>>>
>>>>>>>>>>>          aSnapshotDevice.delete(new
>>>>>>>>>>> URI("/snapshots/<account_id>/<**snapshot_id>"))
>>>>>>>>>>>
>>>>>>>>>>> Finally, deleting a volume, the following snippet would delete a
>>>>>>>>>>> volume from
>>>>>>>>>>> VolumeService:
>>>>>>>>>>>
>>>>>>>>>>>          // Ask the hypervisor to delete the volume
>>>>>>>>>>>
>>>>>>>>>>>          aVolumeDevice.delete(new
>>>>>>>>>>> URI("/volumes/<account_id>/<**volume_id>"))
>>>>>>>>>>>
>>>>>>>>>>> In summary, I believe that the opaque operations specified in
>>>>>>>>>>> the StorageDevice interface can accomplish these goals if the
>>>>>>>>>>> following approaches are employed:
>>>>>>>>>>>
>>>>>>>>>>>          * Logical, reversible URIs are constructed by the storage
>>>>>>>>>>>
>>>>>>>>>> services.
>>>>
>>>>> These URIs are translated by the StorageDevice implementation to
>>>>>>>>>>> the semantics of the underlying device
>>>>>>>>>>>          * The storage service methods break their logic down into
>>>>>>>>>>> a series operations against one or more StorageDevices.  These
>>>>>>>>>>> operations should conform to common Java idioms because
>>>>>>>>>>>
>>>>>>>>>> StorageDevice
>>>>>>>>>
>>>>>>>>>> is built on the standard Java I/O model (i.e. InputStream,
>>>>>>>>>>> OutputStream,
>>>>>>>>>>>
>>>>>>>>>> URI).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Based on this line of thought, I propose the addition of
>>>>>>>>>>>>> following notions to the storage framework:
>>>>>>>>>>>>>
>>>>>>>>>>>>>      * StorageType (Enumeration)
>>>>>>>>>>>>>         * BLOCK (raw block devices such as iSCSI, NBD, etc)
>>>>>>>>>>>>>         * FILE_SYSTEM (devices addressable through the
>>>>>>>>>>>>> filesystem such as local disks, NFS, etc)
>>>>>>>>>>>>>         * OBJECT (object stores such as S3 and Swift)
>>>>>>>>>>>>>      * StorageDevice (interface)
>>>>>>>>>>>>>          * open(URI aDestinationURI): OutputStream throws
>>>>>>>>>>>>>
>>>>>>>>>>>> IOException
>>>>>>
>>>>>>>          * write(URI aDestinationURI, OutputStream
>>>>>>>>>>>>> anOutputStream) throws IOException
>>>>>>>>>>>>>          * list(URI aDestinationURI) : Set<URI> throws
>>>>>>>>>>>>> IOException
>>>>>>>>>>>>>          * delete(URI aDestinationURI) : boolean throws
>>>>>>>>>>>>> IOException
>>>>>>>>>>>>>          * getType() : StorageType
>>>>>>>>>>>>>      * UnsupportedStorageDevice (unchecked exception): Thrown
>>>>>>>>>>>>>
>>>>>>>>>>>> when
>>>>>>
>>>>>>> an
>>>>>>>>>
>>>>>>>>>> unsuitable device type is provided to a storage service.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All operations on the higher level storage services (e.g.
>>>>>>>>>>>>> ImageService) would accept a StorageDevice parameter on their
>>>>>>>>>>>>> operations.  Using the type property, services can determine
>>>>>>>>>>>>> whether or not the passed device is an suitable (e.g. guarding
>>>>>>>>>>>>> against the use object store such as S3 as VM disk) --
>>>>>>>>>>>>> throwing an UnsupportedStorageDevice exception when a
>>>>>>>>>>>>>
>>>>>>>>>>>> device
>>>>
>>>>> unsuitable for
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> requested operation.  The services would then perform all
>>>>>>>>>>>>> storage
>>>>>>>>>>>>>
>>>>>>>>>>>> operations through the passed StorageDevice.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> One potential gap is security.  I do not know whether or not
>>>>>>>>>>>>> authorization decisions are assumed to occur up the stack from
>>>>>>>>>>>>> the storage engine or as part of it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S. I apologize for taking so long to push my feedback.  I am
>>>>>>>>>>>>> just getting back on station from the birth of our second child.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Congratulation! Thanks for your great feedback.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On Dec 28, 2012, at 8:09 PM, Edison Su <Edison.su@citrix.com>
>>>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   -----Original Message-----
>>>>>>>>>>>>>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>>>>>>>>>>>>>> Sent: Friday, December 28, 2012 2:56 PM
>>>>>>>>>>>>>>> To: cloudstack-dev@incubator.**apache.org<cloudstack-dev@incubator.apache.org>
>>>>>>>>>>>>>>> Subject: Re: new storage framework update
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks. I'm trying to picture how this will change the
>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> code.
>>>>>>
>>>>>>> I think it is something i will need a real example to understand.
>>>>>>>>>>>>>>> Currently we pass a
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yah, the example code is in these files:
>>>>>>>>>>>>>> XenNfsConfigurator
>>>>>>>>>>>>>> DefaultPrimaryDataStoreDriverI**mpl
>>>>>>>>>>>>>> DefaultPrimaryDatastoreProvide**rImpl
>>>>>>>>>>>>>> VolumeServiceImpl
>>>>>>>>>>>>>> DefaultPrimaryDataStore
>>>>>>>>>>>>>> XenServerStorageResource
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You can start from volumeServiceTest ->
>>>>>>>>>>>>>>
>>>>>>>>>>>>> createVolumeFromTemplate
>>>>>>>>
>>>>>>>>> test
>>>>>>>>>>>>>>
>>>>>>>>>>>>> case.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   storageFilerTO and/or volumeTO from the serverto the agent,
>>>>>>>>>>>>>>> and the agent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> These model is not changed, what changed are the commands
>>>>>>>>>>>>>>
>>>>>>>>>>>>> send
>>>>>>
>>>>>>> to
>>>>>>>>>
>>>>>>>>>> resource. Right now, each storage protocol can send it's own
>>>>>>>>>>>>> command to resource.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> All the storage related commands are put under
>>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.cloudstack.storage.**command package. Take
>>>>>>>>>>>>> CopyTemplateToPrimaryStorageCm**d as an example,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It has a field called ImageOnPrimayDataStoreTO, which
>>>>>>>>>>>>>> contains a
>>>>>>>>>>>>>>
>>>>>>>>>>>>> PrimaryDataStoreTO. PrimaryDataStoreTO  contains the basic
>>>>>>>>>>>>> information about a primary storage. If needs to send extra
>>>>>>>>>>>>> information to resource, one can subclass PrimaryDataStoreTO,
>>>>>>>>>>>>>
>>>>>>>>>>>> e.g.
>>>>
>>>>> NfsPrimaryDataStoreTO, which contains nfs server ip, and nfs
>>>>>>>>>>>>>
>>>>>>>>>>>> path.
>>>>
>>>>> In this way, one can write a CLVMPrimaryDataStoreTO, which
>>>>>>>>>>>>> contains clvm's
>>>>>>>>>>>>>
>>>>>>>>>>>> own special information if
>>>>>>>>>>>
>>>>>>>>>>>> needed.   Different protocol uses different TO can simply the
>>>>>>>>>>>>>
>>>>>>>>>>>> code,
>>>>
>>>>> and
>>>>>>>>
>>>>>>>>> easier to add new storage.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   does all of the work. Do we still need things like
>>>>>>>>>>>>>>> LibvirtStorageAdaptor to do the work on the agent side of
>>>>>>>>>>>>>>> actually managing the volumes/pools and implementing them,
>>>>>>>>>>>>>>> connecting
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> them
>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>>>
>>>>>>>>>>>>>> vms? So in implementing new storage we will need to write
>>>>>>>>>>>>>>> both a configurator and potentially a storage adaptor?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, that's minimal requirements.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   On Dec 27, 2012 6:41 PM, "Edison Su" <Edison.su@citrix.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>>>   Hi All,
>>>>>>>>>>>>>>>>       Before heading into holiday, I'd like to update the
>>>>>>>>>>>>>>>> current status of the new storage framework since last
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> collab12.
>>>>
>>>>>      1. Class diagram of primary storage is evolved:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/download/**
>>>> attachments/30741569/sto<https://cwiki.apache.org/confluence/download/attachments/30741569/sto>
>>>>
>>>>> r
>>>>>>>>>>>>>>> age.jpg?version=1&**modificationDate=1356640617613
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>            Highlight the current design:
>>>>>>>>>>>>>>>>            a.  One storage provider can cover multiple
>>>>>>>>>>>>>>>> storage protocols for multiple hypervisors. The default
>>>>>>>>>>>>>>>> storage provider can almost cover all the current primary
>>>>>>>>>>>>>>>> storage protocols. In most of cases, you don't need to
>>>>>>>>>>>>>>>> write a new storage provider, what you need to do is to
>>>>>>>>>>>>>>>> write a new storage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> configurator.
>>>>>>>>
>>>>>>>>> Write a new storage provider needs to write a lot of code,
>>>>>>>>>>>>>>>> which we should avoid it as much as
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           b. A new type hierarchy,
>>>>>>>>>>>>>>>> primaryDataStoreConfigurator, is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> added.
>>>>>>>>>
>>>>>>>>>> The configurator is a factory for primaryDataStore, which
>>>>>>>>>>>>>>>> assemble StorageProtocolTransformer,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> PrimaryDataStoreLifeCycle
>>>>>>>>
>>>>>>>>> and PrimaryDataStoreDriver for PrimaryDataStore object,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> based
>>>>>>
>>>>>>> on the hypervisor type and the storage protocol.  For
>>>>>>>>>>>>>>>> example, for nfs primary storage on xenserver, there is a
>>>>>>>>>>>>>>>> class called XenNfsConfigurator, which put
>>>>>>>>>>>>>>>> DefaultXenPrimaryDataStoreLife**Cycle,
>>>>>>>>>>>>>>>> NfsProtocolTransformer and
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> DefaultPrimaryDataStoreDriverI**mpl
>>>>>>
>>>>>>> into DefaultPrimaryDataStore. One provider can only have
>>>>>>>>>>>>>>>> one configurator for a pair of hypervisor type and storage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> protocol.
>>>>
>>>>> For example, if you want to add a new nfs protocol
>>>>>>>>>>>>>>>> configurator for xenserver hypervisor, you need to write a
>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> storage provider.
>>>>>>>>
>>>>>>>>>          c. A new interface, StorageProtocolTransformer, is added.
>>>>>>>>>>>>>>>> The main purpose of this interface is to handle the
>>>>>>>>>>>>>>>> difference between different storage protocols. It has four
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> methods:
>>>>>>
>>>>>>>               getInputParamNames: return a list of name of
>>>>>>>>>>>>>>>> parameters for a particular protocol. E.g. NFS protocol has
>>>>>>>>>>>>>>>> ["server", "path"], ISCSI has ["iqn", "lun"] etc. UI
>>>>>>>>>>>>>>>> shouldn't hardcode these parameters any
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>               normalizeUserInput: given a user input from
>>>>>>>>>>>>>>>> UI/API, need to validate the input, and break it apart,
>>>>>>>>>>>>>>>> then store them into
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> database
>>>>>>>>>>>
>>>>>>>>>>>>               getDataStoreTO/ getVolumeTO: each protocol can
>>>>>>>>>>>>>>>> have its own volumeTO and primaryStorageTO. TO is the
>>>>>>>>>>>>>>>> object will be passed down to resource, if your storage has
>>>>>>>>>>>>>>>> extra information you want to pass to resource, these two
>>>>>>>>>>>>>>>> methods are the place you can
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> override.
>>>>>>>>>>>
>>>>>>>>>>>>          d. All the time-consuming API calls related to storage is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> async.
>>>>
>>>>>
>>>>>>>>>>>>>>>>         2. Minimal functionalities are implemented:
>>>>>>>>>>>>>>>>              a. Can register a http template, without SSVM
>>>>>>>>>>>>>>>>              b. Can register a NFS primary storage for
>>>>>>>>>>>>>>>> xenserver
>>>>>>>>>>>>>>>>              c. Can download a template into primary storage
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> directly
>>>>
>>>>>             d. Can create a volume from a template
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         3. All about test:
>>>>>>>>>>>>>>>>             a. TestNG test framework is used, as it can
>>>>>>>>>>>>>>>> provide parameter for each test case. For integration test,
>>>>>>>>>>>>>>>> we need to know ip address of hypervisor host, the host
>>>>>>>>>>>>>>>> uuid(if it's xenserver), the primary storage url, the
>>>>>>>>>>>>>>>> template
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> url etc.
>>>>
>>>>> These configurations are better to be parameterized, so for
>>>>>>>>>>>>>>>> each test run, we don't need to modify test case itself,
>>>>>>>>>>>>>>>> instead, we provide a test configuration file for each test
>>>>>>>>>>>>>>>> run. TestNG framework already has this functionality, I
>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> reuse it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>             b. Every pieces of code can be unit tested, which
>>>>>>>>>>>>>>>> means:
>>>>>>>>>>>>>>>>                   b.1 the xcp plugin can be unit tested. I
>>>>>>>>>>>>>>>> wrote a small python code, called mockxcpplugin.py, which
>>>>>>>>>>>>>>>> can directly call xcp
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> plugin.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                   b.2 direct agent hypervisor resource can be
>>>>>>>>>>>>>>>> tested.
>>>>>>>>>>>>>>>> I wrote a mock agent manger, which can load and initialize
>>>>>>>>>>>>>>>> hypervisor resource, and also can send command to
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> resource.
>>>>
>>>>>                   b.3 a storage integration test maven
>>>>>>>>>>>>>>>> project is created, which can test the whole storage
>>>>>>>>>>>>>>>> subsystem, such as create volume from template, which
>>>>>>>>>>>>>>>> including both image and volume
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> components.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>             A new section, called "how to test", is added
>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>   https://cwiki.apache.org/**confluence/display/CLOUDSTACK/**
>>>> Storage+subsys<https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsys>
>>>>
>>>>> t
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> em+2.0,
>>>>>>>>>>>>>>>> please check it out.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>        The code is on the javelin branch, the maven projects
>>>>>>>>>>>>>>>> whose name starting from cloud-engine-storage-* are the
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> code
>>>>>>
>>>>>>> related to storage subsystem. Most of the primary storage
>>>>>>>>>>>>>>>> code is in cloud-engine-storage-volume project.
>>>>>>>>>>>>>>>>         Any feedback/comment is appreciated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>

Mime
View raw message