Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13CC1100E0 for ; Fri, 26 Apr 2013 04:56:02 +0000 (UTC) Received: (qmail 42207 invoked by uid 500); 26 Apr 2013 04:56:01 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 42163 invoked by uid 500); 26 Apr 2013 04:56:01 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 42140 invoked by uid 99); 26 Apr 2013 04:56:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 04:56:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shadowsor@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 04:55:55 +0000 Received: by mail-vc0-f176.google.com with SMTP id hf12so3422981vcb.7 for ; Thu, 25 Apr 2013 21:55:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=aJ9SBJpOP5EnPZPGcL+jg4bFynpzCAsxr6nWLvwblCo=; b=DYUnrDcn1mVjsaUIhr9WPAM1LzzkfxG5Chv/2xtFgY9XjGQpefRssLbt4N6E1pWZEy ShSqnZ3vbHyrTnpg2tKATcaZSq+olpe7QGlPXuz121PtM1eOCBn54GezSxT9rF8j0LpP NNMIwwIVyg6v7LAGgr7tIwRZAoE1kbA05qRl4+8X8LnCzzvIZX0GxFP138CSwXJXluvs DUM4oRermR/+WRYW5poTwkA15U/x6IIDJDwbIX23doMbR1Dyv1uXXbKuHylo9s5Qiew9 ko9yJm47zcUyV8zoFzBgbG5IHApOkEvNxxuyiwWq+QlpEdpOm5f80QrDRmQiwqBZhPxK 3VIg== MIME-Version: 1.0 X-Received: by 10.220.153.69 with SMTP id j5mr28179370vcw.35.1366952134122; Thu, 25 Apr 2013 21:55:34 -0700 (PDT) Received: by 10.52.103.106 with HTTP; Thu, 25 Apr 2013 21:55:34 -0700 (PDT) In-Reply-To: References: <20130425200640.GX64138@USLT-205755.sungardas.corp> Date: Thu, 25 Apr 2013 22:55:34 -0600 Message-ID: Subject: Re: [ACS41] new critical bug From: Marcus Sorensen To: "dev@cloudstack.apache.org" Content-Type: multipart/alternative; boundary=f46d043c7ca661263204db3c582f X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c7ca661263204db3c582f Content-Type: text/plain; charset=ISO-8859-1 Ok, review request in. I applied it to master, but I think there's a whitespace issue or something that keeps it from applying to 4.1. It's basically Edison's patch plus a logger message in VirtualMachineManagerImpl.java that tells the admin why stopping the VM failed (unable to connect to agent). On Thu, Apr 25, 2013 at 10:13 PM, Marcus Sorensen wrote: > argh, that version of stopVirtualMachine (the one returning boolean) isn't > even used! > > > On Thu, Apr 25, 2013 at 10:07 PM, Marcus Sorensen wrote: > >> Actually it's that your patch is for the 'forced' stop. Still doesn't >> apply to 4.1, but the code I'm looking at for the moment is the non-forced >> stop, which returns a boolean. >> >> >> On Thu, Apr 25, 2013 at 10:05 PM, Marcus Sorensen wrote: >> >>> Uh, nevermind, of course I can't return null there because it requires a >>> boolean in 4.1. I'll play with this a bit and get back with you. >>> >>> >>> On Thu, Apr 25, 2013 at 10:01 PM, Marcus Sorensen wrote: >>> >>>> Thanks, Edison. That doesn't apply to 4.1, but it's basically the same >>>> as the first hunk of my patch above, just that it returns null instead of >>>> throwing an exception. I'm going to test that now. I didn't see why that >>>> CloudRuntimeException in the first hunk would be squelched, but the finally >>>> clause makes sense in the second hunk. >>>> >>>> >>>> On Thu, Apr 25, 2013 at 6:08 PM, Edison Su wrote: >>>> >>>>> This code will do the magic: >>>>> } finally { >>>>> if (!stopped) { >>>>> if (!forced) { >>>>> s_logger.warn("Unable to stop vm " + vm); >>>>> try { >>>>> stateTransitTo(vm, Event.OperationFailed, >>>>> vm.getHostId()); >>>>> } catch (NoTransitionException e) { >>>>> s_logger.warn("Unable to transition the state >>>>> " + vm); >>>>> } >>>>> return false; >>>>> } else { >>>>> s_logger.warn("Unable to actually stop " + vm + " >>>>> but continue with release because it's a force stop"); >>>>> vmGuru.finalizeStop(profile, answer); >>>>> } >>>>> } >>>>> } >>>>> >>>>> >>>>> Basically, it means, if stop failed and it's not force stop, then mark >>>>> the vm as running. >>>>> I think the logic here sounds correct, as cloudstack can't delete the >>>>> vm(due to the connection between mgt server and kvm agent is broken), then >>>>> all the operations on the VM should fail, and the VM status shouldn't be >>>>> changed. >>>>> >>>>> But the problem is, the stop api call should fail, as stop vm actually >>>>> failed. >>>>> Could you try the following patch: >>>>> >>>>> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> b/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> index 7bf04ec..ff20c54 100755 >>>>> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> @@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends >>>>> ManagerBase implements UserVmManager, Use >>>>> } >>>>> >>>>> UserVO user = _userDao.findById(userId); >>>>> - >>>>> + boolean status = false; >>>>> try { >>>>> VirtualMachineEntity vmEntity = >>>>> _orchSrvc.getVirtualMachine(vm.getUuid()); >>>>> - vmEntity.stop(new Long(userId).toString()); >>>>> + status = vmEntity.stop(new Long(userId).toString()); >>>>> + if (status) { >>>>> + return _vmDao.findById(vmId); >>>>> + } else { >>>>> + return null; >>>>> + } >>>>> } catch (ResourceUnavailableException e) { >>>>> throw new CloudRuntimeException( >>>>> "Unable to contact the agent to stop the virtual >>>>> machine " >>>>> @@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends >>>>> ManagerBase implements UserVmManager, Use >>>>> + vm, e); >>>>> } >>>>> >>>>> - return _vmDao.findById(vmId); >>>>> + >>>>> } >>>>> >>>>> @Override >>>>> >>>>> >>>>> > -----Original Message----- >>>>> > From: Marcus Sorensen [mailto:shadowsor@gmail.com] >>>>> > Sent: Thursday, April 25, 2013 4:46 PM >>>>> > To: dev@cloudstack.apache.org >>>>> > Subject: Re: [ACS41] new critical bug >>>>> > >>>>> > I tried a few things, including throwing a CloudRuntimeException in >>>>> several >>>>> > places where I thought it made sense, such as an empty >>>>> > AgentUnavailableException catch block, but it doesn't seem to do >>>>> anything at >>>>> > all, I don't see it in the logs, even though I do see the debug code >>>>> that I >>>>> > placed next to it. So I give up for now :-) >>>>> > >>>>> > diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> > b/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> > index 7bf04ec..0373690 100755 >>>>> > --- a/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> > +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java >>>>> > @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends >>>>> > ManagerBase implements UserVmManager, Use >>>>> > if (status) { >>>>> > return status; >>>>> > } else { >>>>> > - return status; >>>>> > + throw new CloudRuntimeException( >>>>> > + "Unable to stop the virtual machine" + vm ); >>>>> > } >>>>> > } >>>>> > >>>>> > diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java >>>>> > b/server/src/com/cloud/vm/VirtualMachineManagerImpl. >>>>> > index 2c2986f..e320ff1 100755 >>>>> > --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java >>>>> > +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java >>>>> > @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends >>>>> > ManagerBase implements VirtualMac >>>>> > vmGuru.finalizeStop(profile, answer); >>>>> > >>>>> > } catch (AgentUnavailableException e) { >>>>> > + s_logger.error("Unable to stop vm, agent unavailable: " >>>>> + >>>>> > e.toString()); >>>>> > + throw new CloudRuntimeException("Unable to stop vm, >>>>> agent >>>>> > unavailable"); >>>>> > } catch (OperationTimedoutException e) { >>>>> > + s_logger.error("Unable to stop vm, operation timed out: >>>>> " + >>>>> > e.toString()); >>>>> > + throw new CloudRuntimeException("Unable to stop vm, >>>>> > + operation >>>>> > timed out"); >>>>> > } finally { >>>>> > if (!stopped) { >>>>> > if (!forced) { >>>>> > >>>>> > >>>>> > On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers >>>>> > wrote: >>>>> > >>>>> > > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote: >>>>> > > > I didn't mark this one as a blocker, but it's still pretty bad >>>>> for >>>>> > > someone >>>>> > > > managing VMs in an automated fashion. Trying to stop a VM when >>>>> the >>>>> > > > KVM agent is disconnected reports success. >>>>> > > > >>>>> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195 >>>>> > > >>>>> > > I'll pull a fix in, if we have one ready before the final blockers. >>>>> > > Otherwise I'd pull it into a 4.1.1 release. >>>>> > > >>>>> >>>> >>>> >>> >> > --f46d043c7ca661263204db3c582f--