cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tutkowski, Mike" <Mike.Tutkow...@netapp.com>
Subject Re: [VOTE] Apache Cloudstack 4.10.0.0 - RC2
Date Thu, 11 May 2017 03:26:08 GMT
Hi,

I analyzed the problematic PR and fixed the issue this way:

https://github.com/mike-tutkowski/cloudstack/commit/c463057c82656a4d7564fc9205bb79517317c629

If a couple people could take a look at this code and see what you think, I can create a PR
against the RC and we can make use of this code in the next RC for 4.10.

Thanks!
Mike

On 5/10/17, 2:34 PM, "Tutkowski, Mike" <Mike.Tutkowski@netapp.com> wrote:

    OK, here’s the gist of the problem:
    
    In StorageManagerImpl.cleanupStorage(boolean), the following line in 4.9
    
                        List<VolumeVO> vols = _volsDao.listVolumesToBeDestroyed(new
Date(System.currentTimeMillis() - ((long) StorageCleanupDelay.value() << 10)));
    
    was changed to the following in 4.10
    
                        // ROOT volumes will be destroyed as part of VM cleanup
                        List<VolumeVO> vols = _volsDao.listNonRootVolumesToBeDestroyed(new
Date(System.currentTimeMillis() - ((long) StorageCleanupDelay.value() << 10)));
    
    This leads to a problem (for both managed and traditional storage) in the following situation:
    
    For example: Let’s say we have a system VM running on NFS primary storage. We then put
this primary storage into maintenance mode, which creates the system VM (with the same name)
on a different primary storage (we do not create a new row in the cloud.vm_instance table
for this VM). While this VM works, the original root disk of the system VM remains on the
original primary storage and is not destroyed by the code in StorageManagerImpl.cleanupStorage(boolean)
in 4.10 because 4.10 (as shown above) only asks for non-root volumes to consider for deletion.
In the 4.9 version of the code, the original root disk is cleaned up in StorageManagerImpl.cleanupStorage(boolean).
The problem with 4.10 relying on a root disk always being deleted when the VM it belongs to
is deleted is that in a situation like this that the system VM doesn’t get deleted at this
point – it gets a new root disk that’s hosted by a different primary storage (so now it’s
original root disk is stranded).
    
    Here is the ticket and the PR where the code change went in:
    https://issues.apache.org/jira/browse/CLOUDSTACK-9660
    https://github.com/apache/cloudstack/pull/1825
    
    To me, this needs to be fixed before we release 4.10, so I am -1 on this RC.
    
    My suggestion would be to basically revert PR 1825 and to make use of just bits and pieces
of it.
    
    For example, this part should be kept:
    
    -                            volService.expungeVolumeAsync(volFactory.getVolume(vol.getId()));
     +                            VolumeInfo volumeInfo = volFactory.getVolume(vol.getId());
     +                            if (volumeInfo != null) {
     +                                volService.expungeVolumeAsync(volumeInfo);
     +                            } else {
     +                                s_logger.debug("Volume " + vol.getUuid() + " is already
destroyed");
     +                            }
    
    On 5/10/17, 8:17 AM, "Tutkowski, Mike" <Mike.Tutkowski@netapp.com> wrote:
    
        I've been running regression tests against the release candidate.
        
        So far, all tests but one have passed.
        
        The failing test is related to the storage cleanup thread. It looks like some code
was changed in 4.10 with regards to this thread and that change is causing a problem around
cleanup for managed storage in a particular situation.
        
        As a result of this, I was going to vote -1.
        
        I'm guessing the fix will not be complicated, but is important.
        
        I don't yet have the fix, however. Once I do, I can reply to this thread.
        
        > On May 10, 2017, at 5:47 AM, Rajani Karuturi <rajani@apache.org> wrote:
        > 
        > I agree to your concerns Wido. I did check the PR before creating
        > RC2. There were some outstanding comments on it.
        > 
        > If no one has started testing RC2 and there are no objections, we
        > can cancel this vote, quickly merge the PR and create RC3.
        > 
        > ~ Rajani
        > 
        > http://cloudplatform.accelerite.com/
        > 
        > On May 10, 2017 at 3:04 PM, Wido den Hollander (wido@widodh.nl)
        > wrote:
        > 
        > Op 10 mei 2017 om 0:33 schreef Will Stevens
        > <williamstevens@gmail.com>:
        > 
        > Just to clarify. Wido, the issue you are experiencing is only
        > with basic
        > networks and also exists in 4.9 right? The issue becomes
        > noticeable when
        > you have a lot of networks. Is that a fair statement?
        > 
        > Well, the provisioning is the same between Basic and Advanced.
        > The VR is utterly slow in doing that.
        > 
        > It happens when you have a lot of VMs in those networks.
        > 
        > In our case we have a couple of thousands VMs.
        > 
        > What I'd like to prevent is that this is merged into 4.9.3, but
        > is not in 4.10.
        > 
        > However, I don't want to delay 4.10 any longer.
        > 
        > Wido
        > 
        > On May 9, 2017 5:39 PM, "Wido den Hollander" <wido@widodh.nl>
        > wrote:
        > 
        > +0
        > 
        > I don't want to VOTE -1 due to a bug we are facing, but for us
        > 4.10 would
        > be a problem due to the VR performance.
        > 
        > A PR is open for this, but I also don't want to delay 4.10 any
        > longer:
        > https://github.com/apache/cloudstack/pull/2089
        > 
        > Technically the VR works, it is just that deployment is utterly
        > slow.
        > 
        > Wido
        > 
        > Op 9 mei 2017 om 7:31 schreef Rajani Karuturi
        > <rajani@apache.org>:
        > 
        > Hi All,
        > 
        > I've created a 4.10.0.0 release, with the following artifacts up
        > for a
        > 
        > vote:
        > 
        > Git Branch and Commit SH:
        > https://gitbox.apache.org/repos/asf?p=cloudstack.git;a=commit;h=
        > 
        > fadc80d50f9e95012c9ff3644b60b600c6f65204
        > 
        > Commit:fadc80d50f9e95012c9ff3644b60b600c6f65204
        > Branch: 4.10.0.0-RC20170509T1030
        > 
        > Source release (checksums and signatures are available at the
        > same
        > location):
        > https://dist.apache.org/repos/dist/dev/cloudstack/4.10.0.0/
        > 
        > PGP release keys (signed using CBB44821):
        > https://dist.apache.org/repos/dist/release/cloudstack/KEYS
        > 
        > Vote will be open for 72 hours.
        > 
        > For sanity in tallying the vote, can PMC members please be sure
        > to
        > 
        > indicate
        > 
        > "(binding)" with their vote
        > 
        > [ ] +1 approve
        > [ ] +0 no opinion
        > [ ] -1 disapprove (and reason why)
        > 
        > ~Rajani
        > http://cloudplatform.accelerite.com/
        
    
    

Mime
View raw message