cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edison Su <Edison...@citrix.com>
Subject RE: new storage framework update
Date Tue, 08 Jan 2013 22:59:50 GMT


> -----Original Message-----
> From: John Burwell [mailto:jburwell@basho.com]
> Sent: Tuesday, January 08, 2013 10:59 AM
> To: cloudstack-dev@incubator.apache.org
> Subject: Re: new storage framework update
> 
> Edison,
> 
> In reviewing the javelin, I feel that there is a missing abstraction.  At the
> lowest level, storage operations are the storage, retrieval, deletion, and
> listing of byte arrays stored at a particular URI.  In order to implement this
> concept in the current Javelin branch, 3-5 strategy classes must implemented
> to perform the following low-level operations:
> 
>    * open(URI aDestinationURI): OutputStream throws IOException
>    * write(URI aDestinationURI, OutputStream anOutputStream) throws
> IOException
>    * list(URI aDestinationURI) : Set<URI> throws IOException
>    * delete(URI aDestinationURI) : boolean throws IOException
> 
> The logic for each of these strategies will be identical which will lead to to the
> creation of a support class + glue code (i.e. either individual adapter classes

If the lowest api is too opaque, like one URI as parameter,  I am wondering it may make the
implementation more complicated than it sounds.
For example, there are at least 3 APIs for primary storage driver: createVolumeFromTemplate,
createDataDisk, deleteVolume, and two snapshot related APIs: createSnapshot, deleteSnapshot.

How to encode above operations into simple write/delete APIs? If one URI contains too much
information, then at the end of day, the receiver side(the code in hypervisor resource), who
is responsible to decode the URI, is becoming complicated.  That's the main reason, I decide
to use more specific APIs instead of one opaque URI. 
That's true, if the API is too specific, people needs to implement ton of APIs(mainly imagedatastoredirver,
primarydatastoredriver, backupdatastoredriver), and all over the place. 
Which one is better? People can jump into discuss.


> or a class that implements a ton of interfaces).  In addition to this added
> complexity, this segmented approach prevents the implementation of
> common, logical storage features such as ACL enforcement and asset

This is a good question, how to share the code across multiple components. For example, one
storage can be used as both primary storage and backup storage. In the current code, developer
needs to implement both primarydataStoredriver and backupdatastoredriver, in order to share
code between these two drivers if needed, I think developer can write one driver which implements
both interfaces. 

> encryption.  With a common representation of a StorageDevice that operates
> on the standard Java I/O model, we can layer in cross-cutting storage
> operations in a consistent manner.

I agree that nice to have a standard device model, like the POSIX file system API in Unix
world. But I haven't figure out how to generalized all the operations on the storage, as I
mentioned above.
I can think about, createvolumefromtemplate, can be generalized as link api, but how about
taking snapshot? How about who will handle the difference between delete voume and  delete
snapshot, if they are using the same delete API?

> 
> Based on this line of thought, I propose the addition of following notions to
> the storage framework:
> 
>    * StorageType (Enumeration)
>       * BLOCK (raw block devices such as iSCSI, NBD, etc)
>       * FILE_SYSTEM (devices addressable through the filesystem such as local
> disks, NFS, etc)
>       * OBJECT (object stores such as S3 and Swift)
>    * StorageDevice (interface)
>        * open(URI aDestinationURI): OutputStream throws IOException
>        * write(URI aDestinationURI, OutputStream anOutputStream) throws
> IOException
>        * list(URI aDestinationURI) : Set<URI> throws IOException
>        * delete(URI aDestinationURI) : boolean throws IOException
>        * getType() : StorageType
>    * UnsupportedStorageDevice (unchecked exception): Thrown when an
> unsuitable device type is provided to a storage service.
> 
> All operations on the higher level storage services (e.g. ImageService) would
> accept a StorageDevice parameter on their operations.  Using the type
> property, services can determine whether or not the passed device is an
> suitable (e.g. guarding against the use object store such as S3 as VM disk) --
> throwing an UnsupportedStorageDevice exception when a device unsuitable
> for the requested operation.  The services would then perform all storage
> operations through the passed StorageDevice.
> 
> One potential gap is security.  I do not know whether or not authorization
> decisions are assumed to occur up the stack from the storage engine or as
> part of it.
> 
> Thanks,
> -John
> 
> P.S. I apologize for taking so long to push my feedback.  I am just getting back
> on station from the birth of our second child.


Congratulation! Thanks for your great feedback.

> 
> On Dec 28, 2012, at 8:09 PM, Edison Su <Edison.su@citrix.com> wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> >> Sent: Friday, December 28, 2012 2:56 PM
> >> To: cloudstack-dev@incubator.apache.org
> >> Subject: Re: new storage framework update
> >>
> >> Thanks. I'm trying to picture how this will change the existing code.
> >> I think it is something i will need a real example to understand.
> >> Currently we pass a
> > Yah, the example code is in these files:
> > XenNfsConfigurator
> > DefaultPrimaryDataStoreDriverImpl
> > DefaultPrimaryDatastoreProviderImpl
> > VolumeServiceImpl
> > DefaultPrimaryDataStore
> > XenServerStorageResource
> >
> > You can start from volumeServiceTest -> createVolumeFromTemplate test
> case.
> >
> >> storageFilerTO and/or volumeTO from the serverto the agent, and the
> >> agent
> > These model is not changed, what changed are the commands send to
> resource. Right now, each storage protocol can send it's own command to
> resource.
> > All the storage related commands are put under
> org.apache.cloudstack.storage.command package. Take
> CopyTemplateToPrimaryStorageCmd as an example,
> > It has a field called ImageOnPrimayDataStoreTO, which contains a
> PrimaryDataStoreTO. PrimaryDataStoreTO  contains the basic information
> about a primary storage. If needs to send extra information to resource, one
> can subclass PrimaryDataStoreTO, e.g. NfsPrimaryDataStoreTO, which
> contains nfs server ip, and nfs path. In this way, one can write a
> CLVMPrimaryDataStoreTO, which contains clvm's own special information if
> needed.   Different protocol uses different TO can simply the code, and
> easier to add new storage.
> >
> >> does all of the work. Do we still need things like
> >> LibvirtStorageAdaptor to do the work on the agent side of actually
> >> managing the volumes/pools and implementing them, connecting them
> to
> >> vms? So in implementing new storage we will need to write both a
> >> configurator and potentially a storage adaptor?
> >
> > Yes, that's minimal requirements.
> >
> >> On Dec 27, 2012 6:41 PM, "Edison Su" <Edison.su@citrix.com> wrote:
> >>
> >>> Hi All,
> >>>     Before heading into holiday, I'd like to update the current
> >>> status of the new storage framework since last collab12.
> >>>    1. Class diagram of primary storage is evolved:
> >>>
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/sto
> >> r
> >> age.jpg?version=1&modificationDate=1356640617613
> >>>          Highlight the current design:
> >>>          a.  One storage provider can cover multiple storage
> >>> protocols for multiple hypervisors. The default storage provider can
> >>> almost cover all the current primary storage protocols. In most of
> >>> cases, you don't need to write a new storage provider, what you need
> >>> to do is to write a new storage configurator. Write a new storage
> >>> provider needs to write a lot of code, which we should avoid it as
> >>> much as
> >> possible.
> >>>         b. A new type hierarchy, primaryDataStoreConfigurator, is added.
> >>> The configurator is a factory for primaryDataStore, which assemble
> >>> StorageProtocolTransformer, PrimaryDataStoreLifeCycle and
> >>> PrimaryDataStoreDriver for PrimaryDataStore object, based on the
> >>> hypervisor type and the storage protocol.  For example, for nfs
> >>> primary storage on xenserver, there is a class called
> >>> XenNfsConfigurator, which put DefaultXenPrimaryDataStoreLifeCycle,
> >>> NfsProtocolTransformer and DefaultPrimaryDataStoreDriverImpl into
> >>> DefaultPrimaryDataStore. One provider can only have one configurator
> >>> for a pair of hypervisor type and storage protocol. For example, if
> >>> you want to add a new nfs protocol configurator for xenserver
> >>> hypervisor, you need to write a new storage provider.
> >>>        c. A new interface, StorageProtocolTransformer, is added. The
> >>> main purpose of this interface is to handle the difference between
> >>> different storage protocols. It has four methods:
> >>>             getInputParamNames: return a list of name of parameters
> >>> for a particular protocol. E.g. NFS protocol has ["server", "path"],
> >>> ISCSI has ["iqn", "lun"] etc. UI shouldn't hardcode these parameters
> >>> any
> >> more.
> >>>             normalizeUserInput: given a user input from UI/API, need
> >>> to validate the input, and break it apart, then store them into database
> >>>             getDataStoreTO/ getVolumeTO: each protocol can have its
> >>> own volumeTO and primaryStorageTO. TO is the object will be passed
> >>> down to resource, if your storage has extra information you want to
> >>> pass to resource, these two methods are the place you can override.
> >>>        d. All the time-consuming API calls related to storage is async.
> >>>
> >>>       2. Minimal functionalities are implemented:
> >>>            a. Can register a http template, without SSVM
> >>>            b. Can register a NFS primary storage for xenserver
> >>>            c. Can download a template into primary storage directly
> >>>           d. Can create a volume from a template
> >>>
> >>>       3. All about test:
> >>>           a. TestNG test framework is used, as it can provide
> >>> parameter for each test case. For integration test, we need to know
> >>> ip address of hypervisor host, the host uuid(if it's xenserver), the
> >>> primary storage url, the template url etc. These configurations are
> >>> better to be parameterized, so for each test run, we don't need to
> >>> modify test case itself, instead, we provide a test configuration
> >>> file for each test run. TestNG framework already has this
> >>> functionality, I just
> >> reuse it.
> >>>           b. Every pieces of code can be unit tested, which means:
> >>>                 b.1 the xcp plugin can be unit tested. I wrote a
> >>> small python code, called mockxcpplugin.py, which can directly call
> >>> xcp
> >> plugin.
> >>>                 b.2 direct agent hypervisor resource can be tested.
> >>> I wrote a mock agent manger, which can load and initialize
> >>> hypervisor resource, and also can send command to resource.
> >>>                 b.3 a storage integration test maven project is
> >>> created, which can test the whole storage subsystem, such as create
> >>> volume from template, which including both image and volume
> >> components.
> >>>           A new section, called "how to test", is added into
> >>>
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsys
> >> t
> >>> em+2.0,
> >>> please check it out.
> >>>
> >>>      The code is on the javelin branch, the maven projects whose
> >>> name starting from cloud-engine-storage-* are the code related to
> >>> storage subsystem. Most of the primary storage code is in
> >>> cloud-engine-storage-volume project.
> >>>       Any feedback/comment is appreciated.
> >>>


Mime
View raw message