cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-9857) CloudStack KVM Agent Self Fencing - improper systemd config
Date Thu, 20 Apr 2017 14:04:04 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976738#comment-15976738
] 

ASF subversion and git services commented on CLOUDSTACK-9857:
-------------------------------------------------------------

Commit 9cc3ae8a942122ba3384b348376c6a948a2a74cc in cloudstack's branch refs/heads/49-to-master
from [~rajanik]
[ https://gitbox.apache.org/repos/asf?p=cloudstack.git;h=9cc3ae8 ]

Merge release branch 4.9 to master

* 4.9:
  CLOUDSTACK-9857: With this change if agent dies the systemd will catch it properly and show
process as exited
  CLOUDSTACK-9805: Display VR list in network details
  CLOUDSTACK-9356: FIX Cannot add users in VPC VPN


> CloudStack KVM Agent Self Fencing  - improper systemd config
> ------------------------------------------------------------
>
>                 Key: CLOUDSTACK-9857
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9857
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM
>    Affects Versions: 4.5.2
>            Reporter: Abhinandan Prateek
>            Assignee: Abhinandan Prateek
>            Priority: Critical
>             Fix For: 4.10.0.0
>
>
> We had a database outage few days ago, we noticed that most of cloudstack KVM agents
committed a suicide and never retried to connect. Moreover - we had puppet - that was suppose
to restart cloudstack-agent daemon when it goes into failed, but apparently it never does
go to “failed” state.
> 2017-03-30 04:07:50,720 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Request:Seq
-1--1:  { Cmd , MgmtId: -1, via: -1, Ver: v1, Flags: 111, [{"com.cloud.agent.api.ReadyCommand":{"_details":"com.cloud.utils.exception.CloudRuntimeException:
DB Exception on: null","wait":0}}] }
> 2017-03-30 04:07:50,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Processing
command: com.cloud.agent.api.ReadyCommand
> 2017-03-30 04:07:50,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Not ready
to connect to mgt server: com.cloud.utils.exception.CloudRuntimeException: DB Exception on:
null
> 2017-03-30 04:07:50,722 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) Stopping
the agent: Reason = sig.kill
> 2017-03-30 04:07:50,723 DEBUG [cloud.agent.Agent] (AgentShutdownThread:null) Sending
shutdown to management server
> While agent fenced itself for whatever logic reason it had - the systemd agent did not
exit properly.
> Here what the status of the cloudstack-agent looks like
> [root@mqa6-kvm02 ~]# service cloudstack-agent status
> ● cloudstack-agent.service - SYSV: Cloud Agent
>    Loaded: loaded (/etc/rc.d/init.d/cloudstack-agent)
>    Active: active (exited) since Fri 2017-03-31 23:50:47 GMT; 12s ago
>      Docs: man:systemd-sysv-generator(8)
>   Process: 632 ExecStop=/etc/rc.d/init.d/cloudstack-agent stop (code=exited, status=0/SUCCESS)
>   Process: 654 ExecStart=/etc/rc.d/init.d/cloudstack-agent start (code=exited, status=0/SUCCESS)
>  Main PID: 441
> Mar 31 23:50:47 mqa6-kvm02 systemd[1]: Starting SYSV: Cloud Agent...
> Mar 31 23:50:47 mqa6-kvm02 cloudstack-agent[654]: Starting Cloud Agent:
> Mar 31 23:50:47 mqa6-kvm02 systemd[1]: Started SYSV: Cloud Agent.
> Mar 31 23:50:49 mqa6-kvm02 sudo[806]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/grep
InitiatorName= /etc/iscsi/initiatorname.iscsi
> The "Active: active (exited)" should be "Active: failed (Result: exit-code)”
> Solution:
> The fix is to add pidfile into /etc/init.d/cloudstack-agent 
> Like so:
> # chkconfig: 35 99 10
> # description: Cloud Agent
> + # pidfile: /var/run/cloudstack-agent.pid
> Post that - if agent dies - the systemd will catch it properly and it will look as expected
> [root@mqa6-kvm02 ~]# service cloudstack-agent status
> ● cloudstack-agent.service - SYSV: Cloud Agent
>    Loaded: loaded (/etc/rc.d/init.d/cloudstack-agent)
>    Active: failed (Result: exit-code) since Fri 2017-03-31 23:51:40 GMT; 7s ago
>      Docs: man:systemd-sysv-generator(8)
>   Process: 1124 ExecStop=/etc/rc.d/init.d/cloudstack-agent stop (code=exited, status=255)
>   Process: 949 ExecStart=/etc/rc.d/init.d/cloudstack-agent start (code=exited, status=0/SUCCESS)
>  Main PID: 975
> With this change - some other tool can properly inspect the state of daemon and take
actions when it failed instead of it being in active (exited) state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message