incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Burwell <>
Subject Re: new storage framework update
Date Wed, 09 Jan 2013 04:50:30 GMT

Please see my thoughts in-line below.  I apologize for S3-centric nature of my example in
advance -- it happens to be top of mind for obvious reasons ...


On Jan 8, 2013, at 5:59 PM, Edison Su <> wrote:

>> -----Original Message-----
>> From: John Burwell []
>> Sent: Tuesday, January 08, 2013 10:59 AM
>> To:
>> Subject: Re: new storage framework update
>> Edison,
>> In reviewing the javelin, I feel that there is a missing abstraction.  At the
>> lowest level, storage operations are the storage, retrieval, deletion, and
>> listing of byte arrays stored at a particular URI.  In order to implement this
>> concept in the current Javelin branch, 3-5 strategy classes must implemented
>> to perform the following low-level operations:
>>   * open(URI aDestinationURI): OutputStream throws IOException
>>   * write(URI aDestinationURI, OutputStream anOutputStream) throws
>> IOException
>>   * list(URI aDestinationURI) : Set<URI> throws IOException
>>   * delete(URI aDestinationURI) : boolean throws IOException
>> The logic for each of these strategies will be identical which will lead to to the
>> creation of a support class + glue code (i.e. either individual adapter classes

I realize that I omitted a couple of definitions in my original email.  First, the StorageDevice
most likely would be implemented on a domain object that also contained configuration information
for a resource.  For example, the S3Impl class would also implement StorageDevice.  On reflection
(and a little pseudo coding), I would also like to refine my original proposed StorageDevice

   * void read(URI aURI, OutputStream anOutputStream) throws IOException
   * void write(URI aURI, InputStream anInputStream)  throws IOException
   * Set<URI> list(URI aURI)  throws IOException
   * boolean delete(URI aURI) throws IOException
   * StorageDeviceType getType()

> If the lowest api is too opaque, like one URI as parameter,  I am wondering it may make
the implementation more complicated than it sounds.
> For example, there are at least 3 APIs for primary storage driver: createVolumeFromTemplate,
createDataDisk, deleteVolume, and two snapshot related APIs: createSnapshot, deleteSnapshot.

> How to encode above operations into simple write/delete APIs? If one URI contains too
much information, then at the end of day, the receiver side(the code in hypervisor resource),
who is responsible to decode the URI, is becoming complicated.  That's the main reason, I
decide to use more specific APIs instead of one opaque URI. 
> That's true, if the API is too specific, people needs to implement ton of APIs(mainly
imagedatastoredirver, primarydatastoredriver, backupdatastoredriver), and all over the place.

> Which one is better? People can jump into discuss.

The URI scheme should be a logical, unique, and reversal values associated with the type of
resource being stored.  For example, the general form of template URIs would "/template/<account_id>/<template_id>/"
and "/template/<account_id>/<template_id>/<uuid>.vhd" .  Therefore, for
account id 2, template id 200, the resource would be assigned a URI of
"/template/2/200/  The StorageDevice implementation translates the logical
URI to a physical representation.  Using S3 as an example, the StorageDevice is configured
to use bucket jsb-cloudstack at endpoint  The S3 storage device would translate
the URI to s3://jsb-cloudstack/templates/2/200/  For an NFS storage device
mounted on nfs://localhost/cloudstack, the StorageDevice would translate the logical URI to
 In short, I believe that we can devise a simple scheme that allows the StorageDevice to treat
the URI path relative to its root.

To my mind, the createVolumeFromTemplate is decomposable into a series of StorageDevice#read
and StorageDevice#write operations which would be issued by the VolumeManager service such
as the following:

public void createVolumeFromTemplate(Template aTemplate, StorageDevice aTemplateDevice, Volume
aVolume, StorageDevice aVolumeDevice) {

try {

if (aVolumeDevice.getType() != StorageDeviceType.BLOCK || aVolumeDevice.getType() != StorageDeviceType.FILE_SYSTEM)
throw new UnsupportedStorageDeviceException(…);

// Pull the template from template device into a temporary directory
final File aTemplateDirectory = new File(<template temp path>)

// Non-DRY -- likely a candidate for a TemplateService#downloadTemplate method URI("/templates/<account_id>/<template_id>/"),
new FileOutStream(aTemplateDirectory.createFille("")); URI("/templates/<account_id>/<template_id>/<template_uuid>.vhd"),
new FileOutputStream(aTemplateDirectory.createFile("<template_uuid>.vhd");

// Perform operations with hypervisor as necessary to register storage which yields 
// anInputStream (possibly a List<InputStream>)

aVolumeDevice.write(new URI("/volume/<account_id>/<volume_id>", anInputStream);

} catch (IOException e) {

	// Log and handle the error ...

} finally {

	// Close resources ...



Dependent on the capabilities of the hypervisor's Java API, the temporary files may not be
required, and an OutputStream could copied directly to an InputStream.  

>> or a class that implements a ton of interfaces).  In addition to this added
>> complexity, this segmented approach prevents the implementation of
>> common, logical storage features such as ACL enforcement and asset
> This is a good question, how to share the code across multiple components. For example,
one storage can be used as both primary storage and backup storage. In the current code, developer
needs to implement both primarydataStoredriver and backupdatastoredriver, in order to share
code between these two drivers if needed, I think developer can write one driver which implements
both interfaces. 

In my opinion, storage drivers classifying their usage limits functionality and composability.
 Hence, my thought is that the StorageDevice should describe its capabilities -- allowing
the various services (e.g. Image, Template, Volume, etc) to determine whether or not the passed
storage devices can support the requested operation.  

>> encryption.  With a common representation of a StorageDevice that operates
>> on the standard Java I/O model, we can layer in cross-cutting storage
>> operations in a consistent manner.
> I agree that nice to have a standard device model, like the POSIX file system API in
Unix world. But I haven't figure out how to generalized all the operations on the storage,
as I mentioned above.
> I can think about, createvolumefromtemplate, can be generalized as link api, but how
about taking snapshot? How about who will handle the difference between delete voume and 
delete snapshot, if they are using the same delete API?

The following is an snippet that would be part of the SnapshotService to take a snapshot:

	// Ask the hypervisor to take a snapshot yields anInputStream (e.g. FileInputStream)

	aSnapshotDevice.write(new URI("/snapshots/<account_id>/<snapshot_id>), anInputStream)

Ultimately, a snapshot can be exported to a single file or OutputStream which can written
back out to a StorageDevice.  For deleting a snapshot, the following snippet would perform
the deletion in the SnapshotService:

	// Ask the hypervisor to delete the snapshot ...

	aSnapshotDevice.delete(new URI("/snapshots/<account_id>/<snapshot_id>"))

Finally, deleting a volume, the following snippet would delete a volume from VolumeService:

	// Ask the hypervisor to delete the volume

	aVolumeDevice.delete(new URI("/volumes/<account_id>/<volume_id>"))

In summary, I believe that the opaque operations specified in the StorageDevice interface
can accomplish these goals if the following approaches are employed:
	* Logical, reversible URIs are constructed by the storage services.  These URIs are translated
by the StorageDevice implementation to the semantics of the underlying device
	* The storage service methods break their logic down into a series operations against one
or more StorageDevices.  These operations should conform to common Java idioms because StorageDevice
is built on the standard Java I/O model (i.e. InputStream, OutputStream, URI).


>> Based on this line of thought, I propose the addition of following notions to
>> the storage framework:
>>   * StorageType (Enumeration)
>>      * BLOCK (raw block devices such as iSCSI, NBD, etc)
>>      * FILE_SYSTEM (devices addressable through the filesystem such as local
>> disks, NFS, etc)
>>      * OBJECT (object stores such as S3 and Swift)
>>   * StorageDevice (interface)
>>       * open(URI aDestinationURI): OutputStream throws IOException
>>       * write(URI aDestinationURI, OutputStream anOutputStream) throws
>> IOException
>>       * list(URI aDestinationURI) : Set<URI> throws IOException
>>       * delete(URI aDestinationURI) : boolean throws IOException
>>       * getType() : StorageType
>>   * UnsupportedStorageDevice (unchecked exception): Thrown when an
>> unsuitable device type is provided to a storage service.
>> All operations on the higher level storage services (e.g. ImageService) would
>> accept a StorageDevice parameter on their operations.  Using the type
>> property, services can determine whether or not the passed device is an
>> suitable (e.g. guarding against the use object store such as S3 as VM disk) --
>> throwing an UnsupportedStorageDevice exception when a device unsuitable
>> for the requested operation.  The services would then perform all storage
>> operations through the passed StorageDevice.
>> One potential gap is security.  I do not know whether or not authorization
>> decisions are assumed to occur up the stack from the storage engine or as
>> part of it.
>> Thanks,
>> -John
>> P.S. I apologize for taking so long to push my feedback.  I am just getting back
>> on station from the birth of our second child.
> Congratulation! Thanks for your great feedback.
>> On Dec 28, 2012, at 8:09 PM, Edison Su <> wrote:
>>>> -----Original Message-----
>>>> From: Marcus Sorensen []
>>>> Sent: Friday, December 28, 2012 2:56 PM
>>>> To:
>>>> Subject: Re: new storage framework update
>>>> Thanks. I'm trying to picture how this will change the existing code.
>>>> I think it is something i will need a real example to understand.
>>>> Currently we pass a
>>> Yah, the example code is in these files:
>>> XenNfsConfigurator
>>> DefaultPrimaryDataStoreDriverImpl
>>> DefaultPrimaryDatastoreProviderImpl
>>> VolumeServiceImpl
>>> DefaultPrimaryDataStore
>>> XenServerStorageResource
>>> You can start from volumeServiceTest -> createVolumeFromTemplate test
>> case.
>>>> storageFilerTO and/or volumeTO from the serverto the agent, and the
>>>> agent
>>> These model is not changed, what changed are the commands send to
>> resource. Right now, each storage protocol can send it's own command to
>> resource.
>>> All the storage related commands are put under
>> package. Take
>> CopyTemplateToPrimaryStorageCmd as an example,
>>> It has a field called ImageOnPrimayDataStoreTO, which contains a
>> PrimaryDataStoreTO. PrimaryDataStoreTO  contains the basic information
>> about a primary storage. If needs to send extra information to resource, one
>> can subclass PrimaryDataStoreTO, e.g. NfsPrimaryDataStoreTO, which
>> contains nfs server ip, and nfs path. In this way, one can write a
>> CLVMPrimaryDataStoreTO, which contains clvm's own special information if
>> needed.   Different protocol uses different TO can simply the code, and
>> easier to add new storage.
>>>> does all of the work. Do we still need things like
>>>> LibvirtStorageAdaptor to do the work on the agent side of actually
>>>> managing the volumes/pools and implementing them, connecting them
>> to
>>>> vms? So in implementing new storage we will need to write both a
>>>> configurator and potentially a storage adaptor?
>>> Yes, that's minimal requirements.
>>>> On Dec 27, 2012 6:41 PM, "Edison Su" <> wrote:
>>>>> Hi All,
>>>>>    Before heading into holiday, I'd like to update the current
>>>>> status of the new storage framework since last collab12.
>>>>>   1. Class diagram of primary storage is evolved:
>>>> r
>>>> age.jpg?version=1&modificationDate=1356640617613
>>>>>         Highlight the current design:
>>>>>         a.  One storage provider can cover multiple storage
>>>>> protocols for multiple hypervisors. The default storage provider can
>>>>> almost cover all the current primary storage protocols. In most of
>>>>> cases, you don't need to write a new storage provider, what you need
>>>>> to do is to write a new storage configurator. Write a new storage
>>>>> provider needs to write a lot of code, which we should avoid it as
>>>>> much as
>>>> possible.
>>>>>        b. A new type hierarchy, primaryDataStoreConfigurator, is added.
>>>>> The configurator is a factory for primaryDataStore, which assemble
>>>>> StorageProtocolTransformer, PrimaryDataStoreLifeCycle and
>>>>> PrimaryDataStoreDriver for PrimaryDataStore object, based on the
>>>>> hypervisor type and the storage protocol.  For example, for nfs
>>>>> primary storage on xenserver, there is a class called
>>>>> XenNfsConfigurator, which put DefaultXenPrimaryDataStoreLifeCycle,
>>>>> NfsProtocolTransformer and DefaultPrimaryDataStoreDriverImpl into
>>>>> DefaultPrimaryDataStore. One provider can only have one configurator
>>>>> for a pair of hypervisor type and storage protocol. For example, if
>>>>> you want to add a new nfs protocol configurator for xenserver
>>>>> hypervisor, you need to write a new storage provider.
>>>>>       c. A new interface, StorageProtocolTransformer, is added. The
>>>>> main purpose of this interface is to handle the difference between
>>>>> different storage protocols. It has four methods:
>>>>>            getInputParamNames: return a list of name of parameters
>>>>> for a particular protocol. E.g. NFS protocol has ["server", "path"],
>>>>> ISCSI has ["iqn", "lun"] etc. UI shouldn't hardcode these parameters
>>>>> any
>>>> more.
>>>>>            normalizeUserInput: given a user input from UI/API, need
>>>>> to validate the input, and break it apart, then store them into database
>>>>>            getDataStoreTO/ getVolumeTO: each protocol can have its
>>>>> own volumeTO and primaryStorageTO. TO is the object will be passed
>>>>> down to resource, if your storage has extra information you want to
>>>>> pass to resource, these two methods are the place you can override.
>>>>>       d. All the time-consuming API calls related to storage is async.
>>>>>      2. Minimal functionalities are implemented:
>>>>>           a. Can register a http template, without SSVM
>>>>>           b. Can register a NFS primary storage for xenserver
>>>>>           c. Can download a template into primary storage directly
>>>>>          d. Can create a volume from a template
>>>>>      3. All about test:
>>>>>          a. TestNG test framework is used, as it can provide
>>>>> parameter for each test case. For integration test, we need to know
>>>>> ip address of hypervisor host, the host uuid(if it's xenserver), the
>>>>> primary storage url, the template url etc. These configurations are
>>>>> better to be parameterized, so for each test run, we don't need to
>>>>> modify test case itself, instead, we provide a test configuration
>>>>> file for each test run. TestNG framework already has this
>>>>> functionality, I just
>>>> reuse it.
>>>>>          b. Every pieces of code can be unit tested, which means:
>>>>>                b.1 the xcp plugin can be unit tested. I wrote a
>>>>> small python code, called, which can directly call
>>>>> xcp
>>>> plugin.
>>>>>                b.2 direct agent hypervisor resource can be tested.
>>>>> I wrote a mock agent manger, which can load and initialize
>>>>> hypervisor resource, and also can send command to resource.
>>>>>                b.3 a storage integration test maven project is
>>>>> created, which can test the whole storage subsystem, such as create
>>>>> volume from template, which including both image and volume
>>>> components.
>>>>>          A new section, called "how to test", is added into
>>>> t
>>>>> em+2.0,
>>>>> please check it out.
>>>>>     The code is on the javelin branch, the maven projects whose
>>>>> name starting from cloud-engine-storage-* are the code related to
>>>>> storage subsystem. Most of the primary storage code is in
>>>>> cloud-engine-storage-volume project.
>>>>>      Any feedback/comment is appreciated.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message