Mailing-List: contact cloudstack-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cloudstack-dev@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of Edison.su@citrix.com
 designates 66.165.176.63 as permitted sender)
From: Edison Su <Edison.su@citrix.com>
To: "cloudstack-dev@incubator.apache.org"
	<cloudstack-dev@incubator.apache.org>
Date: Tue, 15 Jan 2013 17:35:53 -0800
Subject: RE: new storage framework update
Thread-Topic: new storage framework update
Thread-Index: Ac3wOoat0bj0ndJmT3icihkBXmcedgCaXwagADZOUwA=
Message-ID: 
 <C66C814C1ABFA8449FF65CB44953A8B4012CBF3A576F@SJCPMAILBOX01.citrite.net>
References: 
 <C66C814C1ABFA8449FF65CB44953A8B4012CBF3A5710@SJCPMAILBOX01.citrite.net>
 <CALFpzo7XbyTzG_dLuoCkvpPsqSBhxy2LdD7CUrEc5xm49sfX5w@mail.gmail.com>
 <C66C814C1ABFA8449FF65CB44953A8B4012CBF3A571F@SJCPMAILBOX01.citrite.net>
 <E1FFEC96-8D87-4A05-A418-316FAEB3EFF0@basho.com>
 <C66C814C1ABFA8449FF65CB44953A8B4012CBF3A574B@SJCPMAILBOX01.citrite.net>
 <4EADDB6B-D91F-41F8-BEED-F4944566CAEE@basho.com>
 <C66C814C1ABFA8449FF65CB44953A8B4012CBF3A5755@SJCPMAILBOX01.citrite.net>
 <9506226C-2CDC-4FD5-AA7C-879D994959FB@basho.com>
 <C66C814C1ABFA8449FF65CB44953A8B4012CBF3A5764@SJCPMAILBOX01.citrite.net>
In-Reply-To: 
 <C66C814C1ABFA8449FF65CB44953A8B4012CBF3A5764@SJCPMAILBOX01.citrite.net>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

After a lengthy discussion(more than two hours) with John on Skype, I think=
 we figured out the difference between us.  The API proposed by John is mor=
e at the execution level, that's where input/output stream coming from, whi=
ch assumes that both source and destination object will be operated at the =
same place(either inside ssvm, or on hypervisor host). While the API I prop=
osed is more about how to hook up vendor's own storage into cloudstack's mg=
t server, thus can replace the process on how and where to operate on the s=
torage.
Let's talk about the execution model at first, which will have huge impact =
on the design we made. The execution model is about where to execute operat=
ions issued by mgt server. Currently, there is no universal execution model=
, it's quite different for each hypervisor.
 E.g. for KVM, mgt server will send commands to KVM host, there is a java a=
gent running on kvm host, which can execute command send by mgt server.
For xenserver, most of commands will be executed on mgt server, which will =
call xapi, then talking to xenserver host.  But we do put some python code =
at xenserver host, if there are operations not supported by xapi.
For vmware, most of commands will be executed on mgt server, which talking =
to vcenter API, while some of them will be executed inside SSVM.
Due to the different execution models, we'll get into a problem about how a=
nd where to access storage device. For example, there is a storage box, whi=
ch has its own management API to be accessed. Now I want to create a volume=
 on the storage box, where should I call stoage box's create volume api? If=
 we follow up above execution models, we need to call the api at different =
places and even worse, you need to write the API call in different language=
s. For kvm, you may need to write java code in kvm agent, for xenserver, yo=
u may need to write a xapi python plugin, for vmware, you may need to put t=
he java code inside ssvm  etc.
But if the storage box already has management api, why just call it inside =
cloudstack mgt server, then device vendor should just write java code once,=
 for all the different hypervisors? If we don't enforce the execution model=
, then the storage framework should have a hook in management server, devic=
e vendor can decide where to execute commands send by mgt server.
That's my datastoredriver layer used for. Take taking snapshot diagram as a=
n example: https://cwiki.apache.org/confluence/download/attachments/3074156=
9/take+snapshot+sequence.png?version=3D1&modificationDate=3D1358189965000
Datastoredriver is running inside mgt server, while datastoredriver itself =
can decide where to execute "takasnapshot" API, driver can send a command t=
o hypervisor host, or directly call storage box's API, or directly call hyp=
ervisor's own API, or another service running outside of cloudstack mgt ser=
ver. It's all up to the implementation of driver.
Does it make sense? If it's true, the device driver should not take input/o=
ut stream as parameter, as it enforces the execution model, which I don't t=
hink it's necessary.
BTW, John and I will discuss the matter tomorrow on Skype, if you want to j=
oin, please let me know.

> -----Original Message-----
> From: Edison Su [mailto:Edison.su@citrix.com]
> Sent: Monday, January 14, 2013 3:19 PM
> To: cloudstack-dev@incubator.apache.org
> Subject: RE: new storage framework update
>
>
>
> > -----Original Message-----
> > From: John Burwell [mailto:jburwell@basho.com]
> > Sent: Friday, January 11, 2013 12:30 PM
> > To: cloudstack-dev@incubator.apache.org
> > Subject: Re: new storage framework update
> >
> > Edison,
> >
> > I think we are speaking past each other a bit.  My intention is to
> > separate logical and physical storage operations in order to simplify
> > the implementation of new storage providers.  Also, in order to
> > support the widest range of storage mechanisms, I want to eliminate
> > all interface assumptions (implied and explicit) that a storage device
> > supports a file
>
> I think if the nfs secondary storage is optional, then all the inefficien=
t related
> to object storage will get away?
>
> > system.  These two issues make implementation of efficient  storage
> > drivers extremely difficult.  For example, for object stores, we have
> > to create polling synchronization threads that add complexity,
> > overhead, and latency to the system.  If we could connect the
> > OutputStream of a source (such as an HTTP
> > upload) to the InputStream of the object store, transfer operations
> > would be far simpler and efficient.  The conflation of logical and
> > physical operations also increases difficulty and complexity to
> > reliably and maintainably implement cross-cutting storage features
> > such as at-rest encryption.  In my opinion, the current design in
> > Javelin makes progress on the first point, but does not address the
> > second point.  Therefore, I propose that we refine the design to
> > explicitly separate logical and physical operations and utilize the
> > higher level I/O abstractions provided by the JDK to remove any interfa=
ce
> requirements for a file-based operations.
> >
> > Based on these goals, I propose keeping the logical Image,
> > ImageMotion, Volume, Template, and Snapshot services.  These services
> > would be responsible for logical storage operations (.e.g
> > createVolumeFromTemplate, downloadTemplate, createSnapshot,
> > deleteSnapshot, etc).  To perform physical operations,  the
> > StorageDevice concept would be added with the following operations:
> >
> > * void read(URI aURI, OutputStream anOutputStream) throws IOException
> > * void write(URI aURI, InputStream anInputStream)  throws IOException
> > * Set<URI> list(URI aURI)  throws IOException
> > * boolean delete(URI aURI) throws IOException
> > * StorageDeviceType getType()
>
> I agree with your simplified interface, but still cautious about the simp=
le URI
> may not enough.
> For example, at the driver level, what about driver developer wants to kn=
ow
> extra information about the object being operated?
> I ended up with new APIs like:
> https://cwiki.apache.org/confluence/download/attachments/30741569/prov
> ider.jpg?version=3D1&modificationDate=3D1358168083079
>  At the driver level, it works on two interfaces:
>  DataObject, which is the interface of volume/snapshot/template.
> DataStore, which is the interface of all the primary storage or image sto=
rage.
> The API is pretty much looks like you proposed:
> grantAccess(DataObject, EndPoint ep): make the object accessible for an
> endpoint, and return an URI represent the object. This is used during mov=
ing
> the object around different storages.  For example, in the sequence diagr=
am,
> create volume from template:
> https://cwiki.apache.org/confluence/download/attachments/30741569/crea
> tevolumeFromtemplate.png?version=3D1&modificationDate=3D1358172931767,
> datamotionstrategy will call grantaccess on both source and destination
> datastore, then got two URIs represent the source and destination object,
> then send the URIs to endpoint(it can be the agent running side ssvm, or =
it
> can be a hypervisor host) to conduct the actual copy operation.
> Revokeaccess: the opposite of above API.
> listObjects(DataStore), list objects on datastore
> createAsync(DataObject): create an object on datastore, the driver should=
n't
> care about what's the object it is, but should only care about the size o=
f the
> object, the data store of the object, all of these information can be dir=
ectly
> inferred from DataObject. If the driver needs more information about the
> object, driver developer can get the id of the object, query database, th=
en
> find about more information. And this interface has no assumption about t=
he
> underneath storage, it can be primary storage, or s3/swift, or a ftp serv=
er, or
> whatever writable storage.
> deleteAsync(DataObject): delete an object on a datastore, the opposite of
> createAsync copyAsync(DataObject, DataObject): copy src object to dest
> object. It's for storage migration. Some storage vendor or hypervisor has=
 its
> own efficient way to migrate storage from one place to another. Most of t=
he
> time, the migration across different vendors or different storage
> types(primary <=3D> image storage), needs to go to datamotionservice, whi=
ch
> will be covered later.
> canCopy(DataObject, DataObject): it helps datamotionservice to make the
> decision on storage migration.
>
> For primary storage driver, there are extra two APIs:
> takeSnapshot(SnapshotInfo snapshot): take snapshot
> revertSnapshot(SnapshotInfo snapshot): revert snapshot.
>
>
> >
> > This interface does not mirror any that I am aware of the current JDK.
> > Instead, it leverages the facilities it provides to abstract I/O
> > operations between different types of devices (e.g. reading data from
> > a socket and writing to a file or reading data from a socket and writin=
g it to
> another socket).
> > Specifying the input or output stream allows the URI to remain logical
> > and device agnostic because the device is being a physical stream from
> > which to read or write with it.  Therefore, specifying a logical URI
> > without the associated stream would require implicit assumptions to be
> > made by the StorageDevice and clients regarding data acquisition.  To
> > perform physical operations, one or more instances of StorageDevice
> > would be passed into to the logical service methods to compose into a
> > set of physical operations to perform logical operation (e.g. copying
> > a template from secondary storage to a volume).
>
>
> I think our difference is only about the parameter of the API is an URI o=
r an
> Object.
> Using an Object instead of a plain URI, using an object maybe more flexib=
le,
> and the DataObject itself has an API called: getURI, which can translate =
the
> Object into an URI. See the interface of DataObject:
> https://cwiki.apache.org/confluence/download/attachments/30741569/data
> +model.jpg?version=3D1&modificationDate=3D1358171015660
>
>
> >
> > StorageDevices are not intended to be content aware.  They simply map
> > logical URIs to the physical context they represent (a path on a
> > filesystem, a bucket and key in an object store, a range of blocks in
> > a block store, etc) and perform the requested operation on the
> > physical context (i.e. read a byte stream from the physical location
> > representing "/template/2/200", delete data represented by
> > "/snapshot/3/300", list the contents of the physical location
> > represented by "/volume/4/400", etc).  In my opinion, it would be a
> > misuse of a URI to infer an operation from their content.  Instead,
> > the VolumeService would expose a method such as the following to
> perform the creation of a volume from a template:
> >
> > createVolumeFromTemplate(Template aTemplate, StorageDevice
> > aTemplateDevice, Volume aVolume, StorageDevice aVolumeDevice,
> > Hypervisor aHypervisor)
> >
> > The VolumeService would coordinate the creation of the volume with the
> > passed hypervisor and, using the InputStream and OutputStreams
> > provided by the devices, coordinate the transfer of data between the
> > template storage device and the volume storage device. Ideally, the
> > Template and Volume classes would encapsulate the rules for logical
> > URI creation in a method.  Similarly, the SnapshotService would expose
> > the a method such as the following to take a snapshot of a volume:
> >
> > createSnapshot(Volume aVolume, StorageDevice aSnapshotDevice)
> >
> > The SnapshotService would request the creation of a snapshot for the
> > volume and then request a write of the snapshot data to the
> > StorageDevice through the write method.
>
> I agree, the service has rich apis, while at the driver level, the api sh=
ould be as
> simple and neutral to the object operated on.
> I updated the sequence diagrams:
> create volume from template:
> https://cwiki.apache.org/confluence/download/attachments/30741569/crea
> tevolumeFromtemplate.png?version=3D1&modificationDate=3D1358172931767
> add template into image storage:
> https://cwiki.apache.org/confluence/download/attachments/30741569/regi
> ster+template+on+image+store.png?version=3D1&modificationDate=3D13581895
> 65551
> take snapshot:
> https://cwiki.apache.org/confluence/download/attachments/30741569/take
> +snapshot+sequence.png?version=3D1&modificationDate=3D1358189965438
> backup snapshot into image storage:
> https://cwiki.apache.org/confluence/download/attachments/30741569/back
> up+snapshot+sequence.png?version=3D1&modificationDate=3D1358192407152
>
> Could you help to review?
>
> >
> > I hope these explanations clarify both the design and motivation of my
> > proposal.  I believe it is critical for the project's future
> > development that the storage layer efficiently operate with storage
> > devices that do not support traditional filesystems (e.g. object
> > stores, raw block devices, etc).  There are a fair number of these
> > types of devices which CloudStack will likely need to support in the
> > future.  I believe that CloudStack will be well positioned to
> > maintainability and efficiently support them if it carefully separates =
logical
> and physical storage operations.
>
> Thanks for you feedback, I rewrite the API last weekend based on your
> suggestion, and update the wiki:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsyst
> em+2.0
> The code is starting, but not checked into javelin branch yet.
>
> >
> > Thanks,
> > -John
> >
> > On Jan 9, 2013, at 8:10 PM, Edison Su <Edison.su@citrix.com> wrote:
> >
> > >
> > >
> > >> -----Original Message-----
> > >> From: John Burwell [mailto:jburwell@basho.com]
> > >> Sent: Tuesday, January 08, 2013 8:51 PM
> > >> To: cloudstack-dev@incubator.apache.org
> > >> Subject: Re: new storage framework update
> > >>
> > >> Edison,
> > >>
> > >> Please see my thoughts in-line below.  I apologize for S3-centric
> > >> nature of my example in advance -- it happens to be top of mind for
> > obvious reasons ...
> > >>
> > >> Thanks,
> > >> -John
> > >>
> > >> On Jan 8, 2013, at 5:59 PM, Edison Su <Edison.su@citrix.com> wrote:
> > >>
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: John Burwell [mailto:jburwell@basho.com]
> > >>>> Sent: Tuesday, January 08, 2013 10:59 AM
> > >>>> To: cloudstack-dev@incubator.apache.org
> > >>>> Subject: Re: new storage framework update
> > >>>>
> > >>>> Edison,
> > >>>>
> > >>>> In reviewing the javelin, I feel that there is a missing abstracti=
on.
> > >>>> At the lowest level, storage operations are the storage,
> > >>>> retrieval, deletion, and listing of byte arrays stored at a partic=
ular URI.
> > >>>> In order to implement this concept in the current Javelin branch,
> > >>>> 3-5 strategy classes must implemented to perform the following
> > >>>> low-level
> > >> operations:
> > >>>>
> > >>>>  * open(URI aDestinationURI): OutputStream throws IOException
> > >>>>  * write(URI aDestinationURI, OutputStream anOutputStream)
> throws
> > >>>> IOException
> > >>>>  * list(URI aDestinationURI) : Set<URI> throws IOException
> > >>>>  * delete(URI aDestinationURI) : boolean throws IOException
> > >>>>
> > >>>> The logic for each of these strategies will be identical which
> > >>>> will lead to to the creation of a support class + glue code (i.e.
> > >>>> either individual adapter classes
> > >>
> > >> I realize that I omitted a couple of definitions in my original
> > >> email.  First, the StorageDevice most likely would be implemented
> > >> on a domain object that also contained configuration information
> > >> for a resource.  For example, the S3Impl class would also implement
> > >> StorageDevice.  On reflection (and a little pseudo coding), I would
> > >> also like to refine my original proposed StorageDevice interface:
> > >>
> > >>   * void read(URI aURI, OutputStream anOutputStream) throws
> > IOException
> > >>   * void write(URI aURI, InputStream anInputStream)  throws
> IOException
> > >>   * Set<URI> list(URI aURI)  throws IOException
> > >>   * boolean delete(URI aURI) throws IOException
> > >>   * StorageDeviceType getType()
> > >>
> > >>>
> > >>> If the lowest api is too opaque, like one URI as parameter,  I am
> > >>> wondering
> > >> it may make the implementation more complicated than it sounds.
> > >>> For example, there are at least 3 APIs for primary storage driver:
> > >> createVolumeFromTemplate, createDataDisk, deleteVolume, and two
> > >> snapshot related APIs: createSnapshot, deleteSnapshot.
> > >>> How to encode above operations into simple write/delete APIs? If
> > >>> one URI
> > >> contains too much information, then at the end of day, the receiver
> > >> side(the code in hypervisor resource), who is responsible to decode
> > >> the URI, is becoming complicated.  That's the main reason, I decide
> > >> to use more specific APIs instead of one opaque URI.
> > >>> That's true, if the API is too specific, people needs to implement
> > >>> ton of
> > >> APIs(mainly imagedatastoredirver, primarydatastoredriver,
> > >> backupdatastoredriver), and all over the place.
> > >>> Which one is better? People can jump into discuss.
> > >>>
> > >>
> > >> The URI scheme should be a logical, unique, and reversal values
> > >> associated with the type of resource being stored.  For example,
> > >> the general form of template URIs would
> > >> "/template/<account_id>/<template_id>/template.properties" and
> > >> "/template/<account_id>/<template_id>/<uuid>.vhd" .  Therefore, for
> > >> account id 2, template id 200, the template.properties resource
> > >> would be assigned a URI of "/template/2/200/template.properties.
> > >> The StorageDevice implementation translates the logical URI to a
> > >> physical representation.  Using
> > >> S3 as an example, the StorageDevice is configured to use bucket
> > >> jsb- cloudstack at endpoint s3.amazonaws.com.  The S3 storage
> > >> device would translate the URI to s3://jsb-
> > >> cloudstack/templates/2/200/template.properties.  For an NFS storage
> > >> device mounted on nfs://localhost/cloudstack, the StorageDevice
> > >> would translate the logical URI to
> > >>
> > hfs://localhost/cloudstack/template/<account_id>/<template_id>/templa
> > >> te .properties.  In short, I believe that we can devise a simple
> > >> scheme that allows the StorageDevice to treat the URI path relative
> > >> to its root.
> > >>
> > >> To my mind, the createVolumeFromTemplate is decomposable into a
> > >> series of StorageDevice#read and StorageDevice#write operations
> > >> which would be issued by the VolumeManager service such as the
> following:
> > >>
> > >> public void createVolumeFromTemplate(Template aTemplate,
> > >> StorageDevice aTemplateDevice, Volume aVolume, StorageDevice
> > >> aVolumeDevice) {
> > >>
> > >> try {
> > >>
> > >> if (aVolumeDevice.getType() !=3D StorageDeviceType.BLOCK ||
> > >> aVolumeDevice.getType() !=3D StorageDeviceType.FILE_SYSTEM) { throw
> > new
> > >> UnsupportedStorageDeviceException(...);
> > >> }
> > >>
> > >> // Pull the template from template device into a temporary
> > >> directory final File aTemplateDirectory =3D new File(<template temp
> > >> path>)
> > >>
> > >> // Non-DRY -- likely a candidate for a
> > >> TemplateService#downloadTemplate method
> > aTemplateDevice.read(new
> > >> URI("/templates/<account_id>/<template_id>/template.properties"),
> > new
> > >> FileOutStream(aTemplateDirectory.createFille("template.properties")
> > >> );
> > >> aTemplate.read(new
> > >> URI("/templates/<account_id>/<template_id>/<template_uuid>.vhd"),
> > >> new
> > >>
> FileOutputStream(aTemplateDirectory.createFile("<template_uuid>.vhd
> > >> ")
> > >> ;
> > >>
> > >> // Perform operations with hypervisor as necessary to register
> > >> storage which yields // anInputStream (possibly a
> > >> List<InputStream>)
> > >>
> > >> aVolumeDevice.write(new URI("/volume/<account_id>/<volume_id>",
> > >> anInputStream);
> > >
> > >
> > > Not sure we really need the API looks like java IO, but I can see
> > > the value of using URI to encode objects(volume/snapshot/template etc=
):
> > driver layer API will be very simple, and can be shared by multiple
> > components(volume/image services etc) Currently, there is one
> > datastore object for each storage, the datastore object mainly used by
> > cloudstack mgt server, to read/write database, and to maintain the
> > state of each
> > object(volume/snapshot/template) in the datastore. And the datastore
> > object also provides interface for lifecycle management, and a
> > transformer(which can transform a db object into a *TO, or an URI).
> > The purpose of datastore object is that, I want to offload a lot of
> > logic from volume/template manager into each object, as the manager is
> > a singleton, which is not easy to be extended.
> > > The relationship between these classes are:
> > > For volume service: Volumeserviceimpl -> primarydatastore ->
> > > primarydatastoredriver For image service: imageServiceImpl ->
> > > imagedataStore -> imagedataStoredriver For snapshot service:
> > snapshotServiceImpl -> {primarydataStore/imagedataStore} - >
> > {primarydatastoredriver/imagedatastoredriver}, the snapshot can be on
> > both primarydatastore and imagedatastore.
> > >
> > > The current driver API is not good enough, it's too specific for each=
 object.
> > For example, there will be an API called createsnapshot in
> > primarydatastoredriver, and an API called moveSnapshot in
> > imagedataStoredriver(in order to implement moving snapshot from
> > primary storage to image store ), also may have an API called,
> > createVolume in primarydatastoredriver, and an API called moveVolume
> > in imagedatastoredriver(in order to implement moving volume from
> > primary to image store). The more objects we add, the driver API will b=
e
> bloated.
> > >
> > > If driver API is using the model you suggested, the simple
> > read/write/delete with URI, for example:
> > > void Create(URI uri) throws IOException void copy(URI desturi, URI
> > > srcUri) throws IOException boolean delete(URI uri) throws
> > > IOException set<URI> list(URI uri) throws IOException
> > >
> > > create API has multiple means under different context: if the URI
> > > has
> > "*/volume/*" means creating volume, if URI has "*/template" means
> > creating template, and so on.
> > > The same for copy API:
> > > if both destUri and srcUri is volume, it can have different
> > > meanings, if both
> > volumes are in the same storage, means create a volume a from a base
> > volume. If both are in the different storages, means volume migration.
> > > If destUri is a volume, while the srcUri is a template, means,
> > > create a
> > volume from template.
> > > If destUri is a volume, srcUri is a snapshot and on the same
> > > storage, means revert snapshot If destUri is a volume, srcUri is a
> > > snapshot, but on
> > the different storages, means create volume from snapshot.
> > > If destUri is a snapshot, srcUri is a volume, means create snapshot
> > > from
> > volume.
> > > If destUri is a snapshot, srcUri is a snapshot, but on the different
> > > places,
> > means snapshot backup.
> > > If destUri is a template, srcUri is a snapshot, means create
> > > template from
> > snapshot.
> > > As you can see, the API is too opaque, needs a complicated logic to
> > > encode
> > and decode the URIs.
> > > Are you OK with above API?
> > >
> > >>
> > >> } catch (IOException e) {
> > >>
> > >>      // Log and handle the error ...
> > >>
> > >> } finally {
> > >>
> > >>      // Close resources ...
> > >>
> > >> }
> > >>
> > >> }
> > >>
> > >> Dependent on the capabilities of the hypervisor's Java API, the
> > >> temporary files may not be required, and an OutputStream could
> > >> copied directly to an InputStream.
> > >>
> > >>>
> > >>>> or a class that implements a ton of interfaces).  In addition to
> > >>>> this added complexity, this segmented approach prevents the
> > >> implementation
> > >>>> of common, logical storage features such as ACL enforcement and
> > >>>> asset
> > >>>
> > >>> This is a good question, how to share the code across multiple
> > components.
> > >> For example, one storage can be used as both primary storage and
> > >> backup storage. In the current code, developer needs to implement
> > >> both primarydataStoredriver and backupdatastoredriver, in order to
> > >> share code between these two drivers if needed, I think developer
> > >> can write one driver which implements both interfaces.
> > >>
> > >> In my opinion, storage drivers classifying their usage limits
> > >> functionality and composability.  Hence, my thought is that the
> > >> StorageDevice should describe its capabilities -- allowing the
> > >> various services (e.g. Image, Template, Volume,
> > >> etc) to determine whether or not the passed storage devices can
> > >> support the requested operation.
> > >>
> > >>>
> > >>>> encryption.  With a common representation of a StorageDevice that
> > >>>> operates on the standard Java I/O model, we can layer in
> > >>>> cross-cutting storage operations in a consistent manner.
> > >>>
> > >>> I agree that nice to have a standard device model, like the POSIX
> > >>> file
> > >> system API in Unix world. But I haven't figure out how to
> > >> generalized all the operations on the storage, as I mentioned above.
> > >>> I can think about, createvolumefromtemplate, can be generalized as
> > >>> link
> > >> api, but how about taking snapshot? How about who will handle the
> > >> difference between delete voume and  delete snapshot, if they are
> > >> using the same delete API?
> > >>
> > >> The following is an snippet that would be part of the
> > >> SnapshotService to take a snapshot:
> > >>
> > >>      // Ask the hypervisor to take a snapshot yields anInputStream (=
e.g.
> > >> FileInputStream)
> > >>
> > >>      aSnapshotDevice.write(new
> > >> URI("/snapshots/<account_id>/<snapshot_id>), anInputStream)
> > >>
> > >> Ultimately, a snapshot can be exported to a single file or
> > >> OutputStream which can written back out to a StorageDevice.  For
> > >> deleting a snapshot, the following snippet would perform the
> > >> deletion in
> > the SnapshotService:
> > >>
> > >>      // Ask the hypervisor to delete the snapshot ...
> > >>
> > >>      aSnapshotDevice.delete(new
> > >> URI("/snapshots/<account_id>/<snapshot_id>"))
> > >>
> > >> Finally, deleting a volume, the following snippet would delete a
> > >> volume from
> > >> VolumeService:
> > >>
> > >>      // Ask the hypervisor to delete the volume
> > >>
> > >>      aVolumeDevice.delete(new
> > >> URI("/volumes/<account_id>/<volume_id>"))
> > >>
> > >> In summary, I believe that the opaque operations specified in the
> > >> StorageDevice interface can accomplish these goals if the following
> > >> approaches are employed:
> > >>
> > >>      * Logical, reversible URIs are constructed by the storage servi=
ces.
> > >> These URIs are translated by the StorageDevice implementation to
> > >> the semantics of the underlying device
> > >>      * The storage service methods break their logic down into a
> > >> series operations against one or more StorageDevices.  These
> > >> operations should conform to common Java idioms because
> > StorageDevice
> > >> is built on the standard Java I/O model (i.e. InputStream,
> > >> OutputStream,
> > URI).
> > >>
> > >> Thanks,
> > >> -John
> > >>
> > >>>
> > >>>>
> > >>>> Based on this line of thought, I propose the addition of
> > >>>> following notions to the storage framework:
> > >>>>
> > >>>>  * StorageType (Enumeration)
> > >>>>     * BLOCK (raw block devices such as iSCSI, NBD, etc)
> > >>>>     * FILE_SYSTEM (devices addressable through the filesystem
> > >>>> such as local disks, NFS, etc)
> > >>>>     * OBJECT (object stores such as S3 and Swift)
> > >>>>  * StorageDevice (interface)
> > >>>>      * open(URI aDestinationURI): OutputStream throws IOException
> > >>>>      * write(URI aDestinationURI, OutputStream anOutputStream)
> > >>>> throws IOException
> > >>>>      * list(URI aDestinationURI) : Set<URI> throws IOException
> > >>>>      * delete(URI aDestinationURI) : boolean throws IOException
> > >>>>      * getType() : StorageType
> > >>>>  * UnsupportedStorageDevice (unchecked exception): Thrown when
> > an
> > >>>> unsuitable device type is provided to a storage service.
> > >>>>
> > >>>> All operations on the higher level storage services (e.g.
> > >>>> ImageService) would accept a StorageDevice parameter on their
> > >>>> operations.  Using the type property, services can determine
> > >>>> whether or not the passed device is an suitable (e.g. guarding
> > >>>> against the use object store such as S3 as VM disk) -- throwing
> > >>>> an UnsupportedStorageDevice exception when a device unsuitable
> > >>>> for
> > the
> > >>>> requested operation.  The services would then perform all storage
> > >> operations through the passed StorageDevice.
> > >>>>
> > >>>> One potential gap is security.  I do not know whether or not
> > >>>> authorization decisions are assumed to occur up the stack from
> > >>>> the storage engine or as part of it.
> > >>>>
> > >>>> Thanks,
> > >>>> -John
> > >>>>
> > >>>> P.S. I apologize for taking so long to push my feedback.  I am
> > >>>> just getting back on station from the birth of our second child.
> > >>>
> > >>>
> > >>> Congratulation! Thanks for your great feedback.
> > >>>
> > >>>>
> > >>>> On Dec 28, 2012, at 8:09 PM, Edison Su <Edison.su@citrix.com> wrot=
e:
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> > >>>>>> Sent: Friday, December 28, 2012 2:56 PM
> > >>>>>> To: cloudstack-dev@incubator.apache.org
> > >>>>>> Subject: Re: new storage framework update
> > >>>>>>
> > >>>>>> Thanks. I'm trying to picture how this will change the existing =
code.
> > >>>>>> I think it is something i will need a real example to understand=
.
> > >>>>>> Currently we pass a
> > >>>>> Yah, the example code is in these files:
> > >>>>> XenNfsConfigurator
> > >>>>> DefaultPrimaryDataStoreDriverImpl
> > >>>>> DefaultPrimaryDatastoreProviderImpl
> > >>>>> VolumeServiceImpl
> > >>>>> DefaultPrimaryDataStore
> > >>>>> XenServerStorageResource
> > >>>>>
> > >>>>> You can start from volumeServiceTest ->
> createVolumeFromTemplate
> > >>>>> test
> > >>>> case.
> > >>>>>
> > >>>>>> storageFilerTO and/or volumeTO from the serverto the agent, and
> > >>>>>> the agent
> > >>>>> These model is not changed, what changed are the commands send
> > to
> > >>>> resource. Right now, each storage protocol can send it's own
> > >>>> command to resource.
> > >>>>> All the storage related commands are put under
> > >>>> org.apache.cloudstack.storage.command package. Take
> > >>>> CopyTemplateToPrimaryStorageCmd as an example,
> > >>>>> It has a field called ImageOnPrimayDataStoreTO, which contains a
> > >>>> PrimaryDataStoreTO. PrimaryDataStoreTO  contains the basic
> > >>>> information about a primary storage. If needs to send extra
> > >>>> information to resource, one can subclass PrimaryDataStoreTO, e.g.
> > >>>> NfsPrimaryDataStoreTO, which contains nfs server ip, and nfs path.
> > >>>> In this way, one can write a CLVMPrimaryDataStoreTO, which
> > >>>> contains clvm's
> > >> own special information if
> > >>>> needed.   Different protocol uses different TO can simply the code=
,
> and
> > >>>> easier to add new storage.
> > >>>>>
> > >>>>>> does all of the work. Do we still need things like
> > >>>>>> LibvirtStorageAdaptor to do the work on the agent side of
> > >>>>>> actually managing the volumes/pools and implementing them,
> > >>>>>> connecting
> > >> them
> > >>>> to
> > >>>>>> vms? So in implementing new storage we will need to write both
> > >>>>>> a configurator and potentially a storage adaptor?
> > >>>>>
> > >>>>> Yes, that's minimal requirements.
> > >>>>>
> > >>>>>> On Dec 27, 2012 6:41 PM, "Edison Su" <Edison.su@citrix.com>
> wrote:
> > >>>>>>
> > >>>>>>> Hi All,
> > >>>>>>>   Before heading into holiday, I'd like to update the current
> > >>>>>>> status of the new storage framework since last collab12.
> > >>>>>>>  1. Class diagram of primary storage is evolved:
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/download/attachments/30741569/sto
> > >>>>>> r
> > >>>>>> age.jpg?version=3D1&modificationDate=3D1356640617613
> > >>>>>>>        Highlight the current design:
> > >>>>>>>        a.  One storage provider can cover multiple storage
> > >>>>>>> protocols for multiple hypervisors. The default storage
> > >>>>>>> provider can almost cover all the current primary storage
> > >>>>>>> protocols. In most of cases, you don't need to write a new
> > >>>>>>> storage provider, what you need to do is to write a new storage
> configurator.
> > >>>>>>> Write a new storage provider needs to write a lot of code,
> > >>>>>>> which we should avoid it as much as
> > >>>>>> possible.
> > >>>>>>>       b. A new type hierarchy, primaryDataStoreConfigurator,
> > >>>>>>> is
> > added.
> > >>>>>>> The configurator is a factory for primaryDataStore, which
> > >>>>>>> assemble StorageProtocolTransformer,
> PrimaryDataStoreLifeCycle
> > >>>>>>> and PrimaryDataStoreDriver for PrimaryDataStore object, based
> > >>>>>>> on the hypervisor type and the storage protocol.  For example,
> > >>>>>>> for nfs primary storage on xenserver, there is a class called
> > >>>>>>> XenNfsConfigurator, which put
> > >>>>>>> DefaultXenPrimaryDataStoreLifeCycle,
> > >>>>>>> NfsProtocolTransformer and DefaultPrimaryDataStoreDriverImpl
> > >>>>>>> into DefaultPrimaryDataStore. One provider can only have one
> > >>>>>>> configurator for a pair of hypervisor type and storage protocol=
.
> > >>>>>>> For example, if you want to add a new nfs protocol
> > >>>>>>> configurator for xenserver hypervisor, you need to write a new
> storage provider.
> > >>>>>>>      c. A new interface, StorageProtocolTransformer, is added.
> > >>>>>>> The main purpose of this interface is to handle the difference
> > >>>>>>> between different storage protocols. It has four methods:
> > >>>>>>>           getInputParamNames: return a list of name of
> > >>>>>>> parameters for a particular protocol. E.g. NFS protocol has
> > >>>>>>> ["server", "path"], ISCSI has ["iqn", "lun"] etc. UI shouldn't
> > >>>>>>> hardcode these parameters any
> > >>>>>> more.
> > >>>>>>>           normalizeUserInput: given a user input from UI/API,
> > >>>>>>> need to validate the input, and break it apart, then store
> > >>>>>>> them into
> > >> database
> > >>>>>>>           getDataStoreTO/ getVolumeTO: each protocol can have
> > >>>>>>> its own volumeTO and primaryStorageTO. TO is the object will
> > >>>>>>> be passed down to resource, if your storage has extra
> > >>>>>>> information you want to pass to resource, these two methods
> > >>>>>>> are the place you can
> > >> override.
> > >>>>>>>      d. All the time-consuming API calls related to storage is =
async.
> > >>>>>>>
> > >>>>>>>     2. Minimal functionalities are implemented:
> > >>>>>>>          a. Can register a http template, without SSVM
> > >>>>>>>          b. Can register a NFS primary storage for xenserver
> > >>>>>>>          c. Can download a template into primary storage direct=
ly
> > >>>>>>>         d. Can create a volume from a template
> > >>>>>>>
> > >>>>>>>     3. All about test:
> > >>>>>>>         a. TestNG test framework is used, as it can provide
> > >>>>>>> parameter for each test case. For integration test, we need to
> > >>>>>>> know ip address of hypervisor host, the host uuid(if it's
> > >>>>>>> xenserver), the primary storage url, the template url etc.
> > >>>>>>> These configurations are better to be parameterized, so for
> > >>>>>>> each test run, we don't need to modify test case itself,
> > >>>>>>> instead, we provide a test configuration file for each test
> > >>>>>>> run. TestNG framework already has this functionality, I just
> > >>>>>> reuse it.
> > >>>>>>>         b. Every pieces of code can be unit tested, which means=
:
> > >>>>>>>               b.1 the xcp plugin can be unit tested. I wrote a
> > >>>>>>> small python code, called mockxcpplugin.py, which can directly
> > >>>>>>> call xcp
> > >>>>>> plugin.
> > >>>>>>>               b.2 direct agent hypervisor resource can be teste=
d.
> > >>>>>>> I wrote a mock agent manger, which can load and initialize
> > >>>>>>> hypervisor resource, and also can send command to resource.
> > >>>>>>>               b.3 a storage integration test maven project is
> > >>>>>>> created, which can test the whole storage subsystem, such as
> > >>>>>>> create volume from template, which including both image and
> > >>>>>>> volume
> > >>>>>> components.
> > >>>>>>>         A new section, called "how to test", is added into
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsys
> > >>>>>> t
> > >>>>>>> em+2.0,
> > >>>>>>> please check it out.
> > >>>>>>>
> > >>>>>>>    The code is on the javelin branch, the maven projects whose
> > >>>>>>> name starting from cloud-engine-storage-* are the code related
> > >>>>>>> to storage subsystem. Most of the primary storage code is in
> > >>>>>>> cloud-engine-storage-volume project.
> > >>>>>>>     Any feedback/comment is appreciated.
> > >>>>>>>
> > >>>
> > >