Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5F512200C4D for ; Wed, 5 Apr 2017 21:29:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5DEA4160B94; Wed, 5 Apr 2017 19:29:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A264B160B76 for ; Wed, 5 Apr 2017 21:29:45 +0200 (CEST) Received: (qmail 97129 invoked by uid 500); 5 Apr 2017 19:29:44 -0000 Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list issues@cloudstack.apache.org Received: (qmail 97120 invoked by uid 500); 5 Apr 2017 19:29:44 -0000 Delivered-To: apmail-incubator-cloudstack-issues@incubator.apache.org Received: (qmail 97117 invoked by uid 99); 5 Apr 2017 19:29:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Apr 2017 19:29:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 633DB1A0398 for ; Wed, 5 Apr 2017 19:29:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id NuTmJm7f7pr7 for ; Wed, 5 Apr 2017 19:29:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id A392B5FB62 for ; Wed, 5 Apr 2017 19:29:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E2FCFE06FE for ; Wed, 5 Apr 2017 19:29:41 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9AAA5263C2 for ; Wed, 5 Apr 2017 19:29:41 +0000 (UTC) Date: Wed, 5 Apr 2017 19:29:41 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: cloudstack-issues@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CLOUDSTACK-9857) CloudStack KVM Agent Self Fencing - improper systemd config MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 05 Apr 2017 19:29:46 -0000 [ https://issues.apache.org/jira/browse/CLOUDSTACK-9857?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D15957501#comment-15957501 ]=20 ASF GitHub Bot commented on CLOUDSTACK-9857: -------------------------------------------- Github user wido commented on the issue: https://github.com/apache/cloudstack/pull/2024 =20 LGTM > CloudStack KVM Agent Self Fencing - improper systemd config > ------------------------------------------------------------ > > Key: CLOUDSTACK-9857 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-985= 7 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the defa= ult.)=20 > Components: KVM > Affects Versions: 4.5.2 > Reporter: Abhinandan Prateek > Assignee: Abhinandan Prateek > Priority: Critical > Fix For: 4.10.0.0 > > > We had a database outage few days ago, we noticed that most of cloudstack= KVM agents committed a suicide and never retried to connect. Moreover - we= had puppet - that was suppose to restart cloudstack-agent daemon when it g= oes into failed, but apparently it never does go to =E2=80=9Cfailed=E2=80= =9D state. > 2017-03-30 04:07:50,720 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2= :null) Request:Seq -1--1: { Cmd , MgmtId: -1, via: -1, Ver: v1, Flags: 111= , [{"com.cloud.agent.api.ReadyCommand":{"_details":"com.cloud.utils.excepti= on.CloudRuntimeException: DB Exception on: null","wait":0}}] } > 2017-03-30 04:07:50,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2= :null) Processing command: com.cloud.agent.api.ReadyCommand > 2017-03-30 04:07:50,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2= :null) Not ready to connect to mgt server: com.cloud.utils.exception.CloudR= untimeException: DB Exception on: null > 2017-03-30 04:07:50,722 INFO [cloud.agent.Agent] (AgentShutdownThread:nu= ll) Stopping the agent: Reason =3D sig.kill > 2017-03-30 04:07:50,723 DEBUG [cloud.agent.Agent] (AgentShutdownThread:nu= ll) Sending shutdown to management server > While agent fenced itself for whatever logic reason it had - the systemd = agent did not exit properly. > Here what the status of the cloudstack-agent looks like > [root@mqa6-kvm02 ~]# service cloudstack-agent status > =E2=97=8F cloudstack-agent.service - SYSV: Cloud Agent > Loaded: loaded (/etc/rc.d/init.d/cloudstack-agent) > Active: active (exited) since Fri 2017-03-31 23:50:47 GMT; 12s ago > Docs: man:systemd-sysv-generator(8) > Process: 632 ExecStop=3D/etc/rc.d/init.d/cloudstack-agent stop (code=3D= exited, status=3D0/SUCCESS) > Process: 654 ExecStart=3D/etc/rc.d/init.d/cloudstack-agent start (code= =3Dexited, status=3D0/SUCCESS) > Main PID: 441 > Mar 31 23:50:47 mqa6-kvm02 systemd[1]: Starting SYSV: Cloud Agent... > Mar 31 23:50:47 mqa6-kvm02 cloudstack-agent[654]: Starting Cloud Agent: > Mar 31 23:50:47 mqa6-kvm02 systemd[1]: Started SYSV: Cloud Agent. > Mar 31 23:50:49 mqa6-kvm02 sudo[806]: root : TTY=3Dunknown ; PWD=3D/ = ; USER=3Droot ; COMMAND=3D/bin/grep InitiatorName=3D /etc/iscsi/initiatorna= me.iscsi > The "Active: active (exited)" should be "Active: failed (Result: exit-cod= e)=E2=80=9D > Solution: > The fix is to add pidfile into /etc/init.d/cloudstack-agent=20 > Like so: > # chkconfig: 35 99 10 > # description: Cloud Agent > + # pidfile: /var/run/cloudstack-agent.pid > Post that - if agent dies - the systemd will catch it properly and it wil= l look as expected > [root@mqa6-kvm02 ~]# service cloudstack-agent status > =E2=97=8F cloudstack-agent.service - SYSV: Cloud Agent > Loaded: loaded (/etc/rc.d/init.d/cloudstack-agent) > Active: failed (Result: exit-code) since Fri 2017-03-31 23:51:40 GMT; = 7s ago > Docs: man:systemd-sysv-generator(8) > Process: 1124 ExecStop=3D/etc/rc.d/init.d/cloudstack-agent stop (code= =3Dexited, status=3D255) > Process: 949 ExecStart=3D/etc/rc.d/init.d/cloudstack-agent start (code= =3Dexited, status=3D0/SUCCESS) > Main PID: 975 > With this change - some other tool can properly inspect the state of daem= on and take actions when it failed instead of it being in active (exited) s= tate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)