cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Burwell <>
Subject Re: [MERGE]object_store branch into master
Date Mon, 03 Jun 2013 14:18:36 GMT

Please see my comments in-line.


On May 31, 2013, at 4:04 PM, Chip Childers <> wrote:

> Comments inline:
> On Thu, May 30, 2013 at 09:42:29PM +0000, Edison Su wrote:
>>> -----Original Message-----
>>> From: John Burwell []
>>> Sent: Thursday, May 30, 2013 7:43 AM
>>> To:
>>> Subject: Re: [MERGE]object_store branch into master
>>> It feels like we have jumped to a solution without completely understanding
>>> the scope of the problem and the associated assumptions.  We have a
>>> community of hypervisor experts who we should consult to ensure we have
>>> the best solution.  As such, I recommend mailing the list with the specific
>>> hypervisors and functions that you have been unable to interface to storage
>>> that does not present a filesystem.  I do not recall seeing such a discussion
>>> the list previously.
>> If people using zone-wide primary storage, like, ceph/solidfire, then suddenly, there
is no need for nfs cache storage, as zone-wide storage can be treated as both primary/secondary
storage, S3 as the backup  storage. It's a simple but powerful solution.
>> Why we can't just add code to support this exciting new solutions? It's hard to do
it on master branch, that's why Min and I worked hard to refactor the code, and remove nfs
secondary storage dependency from management server as much as possible. All we know, nfs
secondary storage is not scalable, not matter how fancy aging policy you have, how advanced
capacity planner you have.
>> And that's one of reason I don't care that much about the issue with nfs cache storage,
couldn't we put our energy on cloud style storage solution, instead of on the un-scalable
> Per your comment about you and Min working hard on this: nobody is
> saying that you didn't.  This isn't personal (or shouldn't be).  These
> are questions that are part of a consensus-based approach to
> development.
>>> As I understand the goals of this enhancement, we will support additional
>>> secondary storage types and removing the assumption that secondary
>>> storage will always be NFS or have a filesystem.  As such, when a non-NFS
>>> type of secondary storage is employed, NFS is no longer the repository of
>>> record for this data.  We can always exceed available space in the repository
>>> of record, and the failure scenarios are relatively well understood (4.1.0) --
>>> operations will fail quickly and obviously.  However, as a transitory staging
>>> storage mechanism (4.2.0), the expectation of the user is the NFS storage will
>>> not be as reliable or large.  If the only solution we can provide for this
>>> problem is to recommend an NFS "cache" that is equal to the size of the
>>> object store itself then we have little to no progress addressing our user's
>> No, it's not true.  Admin can add multiple NFS cache storages if they want, there
is no such requirement that NFS storage will be the same size of object store, I can't be
that stupid.
>> It's the same thing that we are doing on the master branch: admin knows that one
NFS secondary storage is not enough, so they can add multiple NFS secondary storage. And on
the master branch,
>> There is no capacity planner for NFS secondary storage, if the code just randomly
chooses one of NFS secondary storages, even if one of them are full. Yes, NFS secondary storage
on master can be full, there is no way to aging out.
>> On the current object_store branch, it has the same behavior, admin can add multiple
NFS cache storages, no capacity planner. While, in case nfs cache storage is full, admin can
just simply remove the db entry related to cached object, and cleanup NFS cache storage, then
suddenly, everything just works. 
>> From implementation point of view, I don't think there is any difference. 
> It's an expectation issue.  Operators expect to be able to manage their
> storage capacity.  So the question is, for the NFS "Cache", how do they
> plan size requirements and manage that capacity?

The driver for employing an object store is to reduce the cost per GB of storage while maintaining
reliability and availability.  Requiring NFS reduces, if not eliminates, this benefit because
system architectures must ensure that the NFS "cache" (staging area) has sufficient capacity
and reliability to hold data until it can be transferred to object storage.  How does adding
multiple staging areas decrease complexity and cost?  As implemented, the NFS "cache" is unbounded
meaning that an operator would need to have a NFS "cache" as large as object storage to avoid
data loss and/or operational failures.

>>> needs.  Fundamentally, the role of the NFS is different in 4.2.0 than 4.1.0.
>>> Therefore, I disagree with the assertion that issue is present in 4.1.0.
>> The role of NFS can be changed, but they share the same problem, no capacity planner,
no aging out policy. 
> Secondary storage capacity management is much easier to grok for
> operators.  I would bet that almost 100% of the time, their usage grows
> on a particular slope, allowing them to plan and allocate more when
> needed.
> For the NFS "cache", lifecycle of objects stored in that location,
> especially cleanup routines, are going to be critical to the healthy
> operation of that environment.


>>> An additional risk in the object_store implementation is that we lead a user
>>> to believe their data has been stored in reliable storage (e.g. S3, Riak CS,
>>> when it may not.  I saw no provision in the object_store to retry transfers if
>> I don't know from which code you get this kind of conclusion. Could you help to point
out in the code?
>> AFAIK, the object can only be either stored in S3 or not stored in S3, I don't know
how  the object can be in a wrong state.
>>> the object_store transfer fails or becomes unavailable.  In 4.0.0/4.1.0, if we
>>> can't connect to S3 or Swift, a background process continuously retries the
>>> upload until successful.
>> Here is the interesting situation coming out: how the mgt server or admin knows that
background process push the objects successfully into s3? There is no guarantee the background
process will success, there is no status track for this background process, right?
>> What I am doing on the object_store branch is that, if push object into S3 failed,
then the whole backup process failed, admin or user needs to send out another API request
to push object into S3. This will guarantee that operation will either success or failed,
instead of in a unknown state that we are doing on master branch. 
> That's the right approach IMO (at least it's correct, per the current
> model of operations either working or not).

As I previously stated, this functionality is a step back from the current Swift and S3 implementations
present in 4.1.0.  I also think it is an unreasonable burden to place on an operator to check
that every possible transfer succeeded and then issue a retry of the copy.

I am also curious about the phrase "backup".  My understanding of this branch's goals was
to support object stores as native secondary storage.  4.1.0 already supports backing up secondary
storage to Swift and S3.  Is your vision for object_store that object stores can be used as
native secondary storage?

>>> Finally, I see this issue as a design issue than a bug.  I don't think we should
>> Again, I don't think it's a design issue, as I said above, it's a bug, both master
branch and object_store have the same bug. It can be fixed, and easy to be fixed on object_store
comparing with fixing it on master branch. And it's not an important issue, comparing to support
cloud style storage solution.
> Can we discuss fixing it in the object_store branch then?

Could you please define what you mean by a cloud style storage solution?  

>>> Given the different use of NFS in the object_store branch vs. current, I don't
>>> see the comparison in this case.  In the current implementation, when we
>>> exhaust space, we are truly out of resource.  However, in the object_store
>>> branch, we have no provision to remove stale data and we may report no
>>> space available when there is plenty of space available in the underlying
>>> object store.  In this scenario, the NFS "cache" becomes an artificial limiter
>>> the capacity of the system.  I do not understand how we have this problem in
>>> current since the object store is only a backup of secondary store -- not
>>> secondary storage itself.
>> As I said before, no matter what's the role of NFS storage, it shares the same issue,
both NFS storage can be out of capacity, no capacity planner, no aging policy. 
> But as I note above, the operator's planning process will be quite
> difficult.

Also, as I previously noted, the exhaustion is a completely different cause.  In 4.1, I am
truly out of the secondary storage.  As Chip mentioned, it is straightforward to plan for
space requirements.  In object_store, I likely am not exhausted of secondary storage space,
but have filled the cache.  Since most operators will want as a little NFS space as necessary
in this scenario, my educated guess is that we will see exhaustion of cache far more frequently.

>>> It is my estimate robust error handling will require design changes (e.g.
>>> introduction of a resource reservation mechanism, introduction of addition
>>> exception classes, enhancement of interfaces to provide more context
>>> regarding client intentions, etc) yielding significant code impact.  These
>>> changes need to undertaken in a holistic manner with minimum risk to
>>> master.   Fundamentally, we should not be merging code to master with
>>> known significant issues.  When it goes to master, we should be saying, "To
>>> the best of my knowledge and developer testing, there are no blocker or
>>> critical issues."  In my opinion, omission of robust error handling does not
>>> meet that standard.
>> To be realistic, on the mgt server, there is only one class which is depended on
cache storage, there is only one interface needs to be implemented to solve the issue, why
we need redesign?
> Right, let's look at how to deal with it cleanly within that
> implementation (although I suspect that the changes will leak out of
> that class).

The lack of error handling extends beyond the cache.  The entire branch needs to be evaluated
for exception handling.

View raw message