cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mlsorensen <...@git.apache.org>
Subject [GitHub] cloudstack pull request #1709: CLOUDSTACK-7982 - KVM live migration
Date Fri, 21 Oct 2016 16:07:42 GMT
Github user mlsorensen commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1709#discussion_r84509497
  
    --- Diff: core/src/com/cloud/agent/api/CancelMigrationCommand.java ---
    @@ -0,0 +1,35 @@
    +// Licensed to the Apache Software Foundation (ASF) under one
    +// or more contributor license agreements.  See the NOTICE file
    +// distributed with this work for additional information
    +// regarding copyright ownership.  The ASF licenses this file
    +// to you under the Apache License, Version 2.0 (the
    +// "License"); you may not use this file except in compliance
    +// with the License.  You may obtain a copy of the License at
    +//
    +//   http://www.apache.org/licenses/LICENSE-2.0
    +//
    +// Unless required by applicable law or agreed to in writing,
    +// software distributed under the License is distributed on an
    +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +// KIND, either express or implied.  See the License for the
    +// specific language governing permissions and limitations
    +// under the License.
    +package com.cloud.agent.api;
    +
    +public class CancelMigrationCommand extends Command {
    --- End diff --
    
    Along these lines of cancellation, I've long thought that we really need the ability to
clean up long running jobs if the agent disconnects from the management server for any reason
(say upgrade or restart of agent or management server, network problems, etc). Normally the
management server will know the job failed but the agent keeps on trucking, causing problems,
especially for things like migrations of storage. This may be an important thing to add for
this feature, to avoid situations where a migration completes but CloudStack does not know
about it because the management server was restarted during the migration.
    
    Rather than forcing the management server to know that the agent work needs to be cleaned
up and sending a command to the hypervisor that is tailored to each command that can fail,
one solution that I've seen implemented that has worked well is for LibvirtComputingResource
to hold a global List<Runnable> of tasks, then it overrides the disconnected() method
and loops through this list, running the tasks. It then exposes methods addDisconnectHook(Runnable
hook) and removeDisconnectHook(Runnable hook) so that commands that are sensitive to being
interrupted can add in cancellation logic in the case of disconnect before starting and remove
it when finished.
    
    Something like:
    
        @Override
        public void disconnected() {
            this._connected = false;
            s_logger.info("Detected agent disconnect event, running through " + _disconnectHooks.size()
+ " disconnect hooks");
            for (Runnable hook : _disconnectHooks) {
                hook.run();
            }
            _disconnectHooks.clear();
        }
    
        public void addDisconnectHook(Runnable hook) {
            s_logger.debug("Adding disconnect hook " + hook);
            _disconnectHooks.add(hook);
        }
    
        public void removeDisconnectHook(Runnable hook) {
            s_logger.debug("Removing disconnect hook " + hook);
            if (_disconnectHooks.contains(hook)) {
                s_logger.debug("Removing disconnect hook " + hook);
                _disconnectHooks.remove(hook);
            } else {
                s_logger.debug("Requested removal of disconnect hook, but hook not found:
" + hook);
            }
        }
    
    An example hook to cancel the migration might look like this:
    
        public class MigrationCancelHook extends Thread {
            private static final Logger LOGGER = Logger.getLogger(MigrationCancelHook.class.getName());
            private static final String HOOK_PREFIX = "MigrationCancelHook-";
            Domain _migratingDomain;
            String _vmName;
    
            public MigrationCancelHook(Domain migratingDomain) throws LibvirtException {
                super(HOOK_PREFIX + migratingDomain.getName());
                _migratingDomain = migratingDomain;
                _vmName = migratingDomain.getName();
            }
    
            @Override
            public void run() {
                LOGGER.info("Interrupted migration of " + _vmName);
                try {
                    if (_migratingDomain.abortJob() == 0) {
                        LOGGER.warn("Aborted migration job for " + _vmName);
                    }
                } catch (Exception ex) {
                    LOGGER.warn("Failed to abort migration job for " + _vmName, ex);
                }
            }
        }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message