cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edison Su <>
Subject RE: [MERGE]object_store branch into master
Date Tue, 28 May 2013 23:42:49 GMT

> -----Original Message-----
> From: John Burwell []
> Sent: Tuesday, May 28, 2013 8:43 AM
> To:
> Subject: Re: [MERGE]object_store branch into master
> All,
> I have gone a through a large chunk of this patch, and published my review
> thus far (   TL;DR is that this patch has a
> number of significant issues which can be summarized as follows:

Thanks for your time to review this patch.

> 1. While it appeas that the requirement of NFS for secondary storage has
> largely been removed, it has basically been if blocked out instead of pushed
> down as an detail choice in the physical layer.  Rather than exploiting
> polymorpish to vary behavior through a set of higher level abstracttions, the
> orchestration performs instanceof NFSTO checks. The concern is that future
> evolution of secondary storage layer will still have dependencies on NFS.

As long as mgt server doesn't explicitly depend on nfs secondary storage during the orchestration,
and the nfs storage can be used based on different configurations, then the resource side
"NFSTO checks" is not much concern to me at the current stage. We can add a " polymorpish
high level abstraction"  at the resource side, as you suggested before, in the future. The
current code change is already so huge now, add a new level a abstraction will make the change
even much bigger. But adding this kind of "high level abstraction" is not hard any more, as
all the storage related commands, are only dealing with DataTO/DataStoreTO. Right now, there
is no read/write data operations on this *TO, one possible way to add a "high level abstraction"
would be add one read/write data operations at each hypervisor resource based on the DataTO
and DataStoreTO.
My point is that refactor mgt server code is much much harder than the resource code. If we
can get the mgt server right, then we can move on to refactor resource code, in the future.

> 2. In some sceanrios, NFS is still a requirement for secondary storage.  In
> particular, Xen users will still have to maintain an "NFS cache".  Given the
> degradation of capability, I think it would be helpful to put a few more
> eyeballs on the Xen implementation to determine if we could avoid using an
> intermediate file system.

Nfs will not disappear soon, for some hypervisors, for some type of storages. But as long
as the mgt server doesn't have this requirement, then people can add new type of hypervisors(e.g.
hyperv), new type of storages(ceph, solidfire etc) which don't need NFS. The optimization
for a particular hypervisor can be done by other people, who may be familiar with some type
of storage, or a certain hypervisor. 

> 3. I have the following concerns regarding potential race conditions and
> resource exhaustion in the NFS cache implementation.
>     - The current random allocator algorithm does not account for the amount
> space that will be required for the operation (i.e. checking to ensure that the
> cache it picks has enough space to transfer to hold the object being
> downloaded) nor does it reserve the space.Given the long (in compute time)
> running nature of these of processes, the available space in a cache could be
> exhausted by a number of concurrently transfering templates and/or
> snapshots.  By reserving space before the transfer, the allocator would be
> able to account for both pending operations and the current contents of the
> cache.

1. It's the current behavior of 4.1 and master secondary storage code. We didn't introduce
new issue here.
2. Add a new cache storage allocator is much easier than adding it in master branch. 
3. I treat it as a bug, and plan to fix it after the merge.

>     - There appears no mechanism to age out contents of the cache.
> Therefore, as implemented, it will grow unbounded.  The only workaround
> for this problem would be to have an NFS cache whose size equals that of
> the object store.

Admin can multiple nfs cache storages into one zone, and it's the current behavior on master
branch. Again, we can add aging policy later, as cache storage has its own pom project, and
it's own implementation.

>     - The mechanism lacks robust error handling or retry logic.  In particular, the
> behavior if/when a cache exhausts available space appears non-deterministic.

Again, master branch will have the same issue, fix the issue will be easier than fixing it
on master branch.

>     - Generally, I see little consideration for alternative/exception flows.  For
> example, what happens if I attempt to use a template/iso/snapshot in transit
> to the object store to/from the cache?  Since these files can be very large
> (multiple GBs), we have assume some transfer latency in all interactions.

Management server mainly maintains the state of each object(volume/snapshot/template), we
add state machine for all the objects, and for all the objects on each storages, which is
unavailable on the master branch. Take copy template from object store to cache as an example,
the state of template entry on object storage and cache storage will be changed to "Copying",
if copying failed or time out, then the state of template entry on object storage will be
changed to "Ready", while the state of template entry on cache storage will be changed to
As long as mgt server maintains the state correctly, user/admin can issue the same operation
through UI/API in case one operation is failed. That' s one of reason, there is no retry logic
in the storage subsystem code. Another reason is that adding retry logic on top of async api
will a little bit difficult.

> 4. The Image Cache abstraction is too tightly coupled to the core
> orchestration mechanism.  I would recommend implementing it as a

The only place is coupled with image cache abstraction is at datamotionstrategy, in particular,

AncientDataMotionStrategy ->needCacheStorage and AncientDataMotionStrategy will call StorageCacheManager->
createCacheObject if cache storage is needed.

> DataStore proxy that is applied as necessary.  For example, the current Xen
> hypervisor implementation must interact with a file system.  It seems most
> appropriate that the Xen hypervisor implementation would apply the proxy
> to its secondary storage DataStore instances.  Therefore, the storage engine

The issue that struggles me for a while is that, how and where to configure cache storage.
Take S3 as an example, the implementation of S3 provider should not need to care about cache
storage at all, S3 implementation should only care about how to deal with data stored in S3.
While S3 is a general purpose storage, the same implementation can be used by different hypervisors,
and in different situations. For example,  If both Ceph and S3 used by KVM, then definitely,
nfs cache is not needed, while for other cluster wide primary storages supported by KVM, nfs
cache is needed. 
How and where to present this kind of configuration, and also both primary storage provider
and object storage provider implementation don't need to aware this kind of configuration?
My current solution is that, write the decision in the code, DataMotionStrategy is the place
to decide to use cache storage or not. In the default implementation, all the copy operations
from s3 to primary storage, cache storage is needed, while if people adding a new storage,
and want to remove cache storage for certain reason, then add a new DataMotionStrategy.

I am not sure how the datastore proxy will solve the issue I mentioned above, I am glad to
know your solution.

> should only provide the facility not attempt to determine when it is needed.
> Additionally, a proxy model would greatly reduce the size and complexity of
> the various classes (e.g. AncientDataMotionStrategy).

The current AncientDataMotionStrategy is only about 400 LOC now, I removed some of the duplicated
code. But still want to know the detail of your solution.

> 5. While I have only reviewed roughly 50-60% of the patch thus far, I see little
> to no additional unit tests for the new code added.  Integration tests have

If the code is changing rapidly, then writing/maintaining unit test cases will take huge effort,
we focused on the integration test and marvin test in this stage.

> been expanded, but many of the test simply execute the service methods
> and do not verify the form of the operation results. There are also a
> tremendous number of ignored exceptions that should likely fail these tests.
> Therefore, we could have tests passing due to an ignored exception that
> should be failing.

1. Thanks, you take a look at the integration test, which is very convenient for us during
the implementation. It saves us a lot time during the test.
2. Yes, the integration test needed to be improved as your suggested. 

> While I recognize the tremendous effort that has been expended to
> implement this capability and value of the feature, I see significant design
> and operational issues that must be addressed before it can be accepted into

I would say the object _store branch is just better than the master branch, almost all the
issues(related to cache storages) you pointed out, are existing on the master branch. While
fixing these issues on master branch is harder than on object_store branch. Frankly speaking,
I don't want to read/understand/guess how it works on the master branch, it's a lot of hack
all around the place.
Of course, these issues are still issues, we need to fix them, but when to fix them is arguable.
I prefer fixing them after the merge. Here is the reason:
If we agreed on the issues you pointed out are existing on the master also, then do we want
to fix them on the 4.2 or not? If the answer is yes, then who will fix them, and how long
it will take to fix them on master? If I propose an offer, that I'll fix these issues after
merging object_store into master, will it be acceptable? If the answer is no,  then there
is no technical reason to not merge object_store into master, as these issues are not introduced
by object_store branch.

> master.  In my opinion, the object_store represents a good first step, but it
> needs a few more review/refinement iterations before it will be ready for a
> master merge.
> Thanks,
> -John
> On May 28, 2013, at 10:12 AM, Nitin Mehta <> wrote:
> > Agree with Wido. This would be a great feature to have in 4.2. Yes,
> > its a lot of change and probably needs more education from Edison and
> > Min maybe through code walkthroughs, documentation, IRC meetings but I
> > am +1 for this to make it to 4.2 and would go as far to say that I
> > would even volunteer for any bug fixes required.
> >
> > I would say its not too bad to merge it now as most of the features
> > for
> > 4.2 are merged by now and not a lot of them would be blocked because
> > of this. Yes, the master would be unstable but it would be even if we
> > merge it post cutting 4.2 branch. I would rather see this coming in
> > 4.2 than wait for another 6 months or so for it. Yes, this is an
> > architectural change and we are learning as a community to time these kind
> of changes.
> > We should also try and raise alarms for these changes much early when
> > the FS was proposed rather than when its done, probably a learning for
> > all of us :)
> >
> > Thanks,
> > -Nitin
> >
> > On 28/05/13 4:23 PM, "Wido den Hollander" <> wrote:
> >
> >>
> >>
> >> On 05/23/2013 06:35 PM, Chip Childers wrote:
> >>> On Wed, May 22, 2013 at 09:25:10PM +0000, Edison Su wrote:
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Chip Childers []
> >>>>> Sent: Wednesday, May 22, 2013 1:26 PM
> >>>>> To:
> >>>>> Subject: Re: [MERGE]object_store branch into master
> >>>>>
> >>>>> On Wed, May 22, 2013 at 08:15:41PM +0000, Animesh Chaturvedi
> wrote:
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Chip Childers []
> >>>>>>> Sent: Wednesday, May 22, 2013 12:08 PM
> >>>>>>> To:
> >>>>>>> Subject: Re: [MERGE]object_store branch into master
> >>>>>>>
> >>>>>>> On Wed, May 22, 2013 at 07:00:51PM +0000, Animesh Chaturvedi
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: John Burwell []
> >>>>>>>>> Sent: Tuesday, May 21, 2013 8:51 AM
> >>>>>>>>> To:
> >>>>>>>>> Subject: Re: [MERGE]object_store branch into master
> >>>>>>>>>
> >>>>>>>>> Edison,
> >>>>>>>>>
> >>>>>>>>> Thanks, I will start going through it today.  Based
on other
> >>>>>>>>> $dayjob responsibilities, it may take me a couple
of days.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> -John
> >>>>>>>> [Animesh>] John we are just a few days away  from
4.2 feature
> >>>>>>>> freeze, can
> >>>>>>> you provide your comments by Friday 5/24.   I would like
all feature
> >>>>> threads
> >>>>>>> to be resolved sooner so that we don't have last minute
> >>>>>>>
> >>>>>>> I'm just going to comment on this, but not take it much
> >>>>>>> this type of change is an "architectural" change.  We had
> >>>>>>> previously discussed (on several
> >>>>>>> threads) that the appropriate time for this sort of thing
to hit
> >>>>>>> master was
> >>>>>>> *early* in the release cycle.  Any reason that that consensus
> >>>>>>> doesn't apply here?
> >>>>>> [Animesh>] Yes it is an architectural change and discussion
> >>>>>> this started a
> >>>>> few weeks back already, Min and Edison wanted to get it in sooner
> >>>>> by
> >>>>> 4/30
> >>>>> but it took longer than anticipated in  preparing for merge and
> >>>>> testing on feature branch.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> You're not following me I think.  See this thread on the Javelin
> >>>>> merge:
> >>>>>
> >>>>>
> >>>>>
> >>>>> We have discussed that our preference is for architectural changes
> >>>>> to hit master shortly after a feature branch is cut.  Why are we
> >>>>> not doing that here?
> >>>>
> >>>> This kind of refactor takes time, a lot of time. I think I worked
> >>>> on the merge of primary storage refactor into master and bug fixes
> >>>> during
> >>>>
> March(
> >>>> 14469 ), then started to work on the secondary storage refactor in
> >>>> April( Min and I
> >>>> finished the coding at end of April, then tested for two weeks,
> >>>> send out the merge request at middle of May.
> >>>> With the refactor, the  storage code will be much cleaner, and the
> >>>> performance of S3 will be improved, and integration with other
> >>>> storage vendor will be much easier, and the quality is ok(33 bugs
> >>>> fired, only 5
> >>>> left:
> >>>>
> >>>> re_Re factor%22). Anyway, it's up to the community to decide, merge
> >>>> it or not, we already tried our best to get it done ASAP.
> >>>>
> >>>>
> >>>
> >>> I'm absolutely not questioning the time and effort here.  I know
> >>> that you have been working hard, and that testing is happening!
> >>>
> >>> I'm only asking if we, as a community, want to follow the practice
> >>> of bringing changes like this in early or late in a cycle.  I
> >>> thought we had agreed on doing it early.
> >>>
> >>
> >> So I tried reviewing the code, but I have to say that it is a lot of
> >> code. Reviewing such a large piece of code isn't easy.
> >>
> >> Now, let me be honest, I'd love to see this in 4.2 since it would
> >> make the Ceph integration a lot better. We can get rid of NFS as
> >> Secondary Storage and use Ceph as the only storage for CS.
> >>
> >> Yes, it might need some work after this branch has been merged, but I
> >> do agree that it's a lot of work to maintain a branch next to master.
> >> Even with smaller fixes you have to do a lot to keep up.
> >>
> >> Imho a feature freeze is a feature freeze. It's set for May 31st and
> >> afterwards we start ironing the bugs out, but no new merges from
> >> other branches.
> >>
> >> We will need the full support from Edison and Co to help iron out
> >> these bugs. Maybe something will be broken after the merge and that
> >> should be fixed asap then.
> >>
> >> Again, my opinion in this is a bit coloured, but I think this will be
> >> a great addition to CloudStack, it would make 4.2 a killer release.
> >>
> >> Wido
> >

View raw message