cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nux (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CLOUDSTACK-10234) HA fails in cases of PSU failure.
Date Tue, 16 Jan 2018 17:08:00 GMT
Nux created CLOUDSTACK-10234:
--------------------------------

             Summary: HA fails in cases of PSU failure.
                 Key: CLOUDSTACK-10234
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10234
             Project: CloudStack
          Issue Type: Improvement
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.11.0.0
         Environment: 4.11 RC1, NFS storage, CentOS 7 management server and hypervisors
            Reporter: Nux


To simulate PSU failure I pulled the power from the server physically, HA fails to do the
right thing and move the affected VMs to other HVs.

I waited a good while, but alas nothing happened. The VM and VR running on the affected hypervisor
were never moved to another one (I have another 2 running).

 

This is what I see in the management server logs:
{code:java}
Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band Management action
(OFF) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed with error: Get Auth Capabilities
error Error issuing Get Channel Authentication Capabilities request Error: Unable to establish
IPMI v2 / RMCP+ session     at org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
    at sun.reflect.GeneratedMethodAccessor199.invoke(Unknown Source)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    ... 21 more 2018-01-16 17:00:13,396 WARN  [o.a.c.alerts] (pool-5-thread-7:null) (logid:4f7299f6)
AlertType:: 30 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: HA Fencing of
host id=1, in dc id=1 performed 2018-01-16 17:00:15,375 DEBUG [c.c.a.t.Request] (pool-2-thread-27:null)
(logid:6b21a8c1) Seq 5-9115285645797884785: Sending  \{ Cmd , MgmtId: 161334379813, via:
5(hv03.cloud.local), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckOnHostCommand":{"host":{"guid":"598d48ef-158d-3e14-ad68-8d02c9368ddf-LibvirtComputingResource","privateNetwork":{"ip":"172.16.25.101","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:f6","isSecurityGroupEnabled":false},"publicNetwork":\{"ip":"172.16.25.101","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:f6","isSecurityGroupEnabled":false},"storageNetwork1":\{"ip":"172.16.25.101","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:f6","isSecurityGroupEnabled":false}},"wait":20}}]
} 2018-01-16 17:00:15,380 DEBUG [c.c.a.t.Request] (pool-2-thread-5:null) (logid:bb993597)
Seq 4-6582855280332112812: Sending  \{ Cmd , MgmtId: 161334379813, via: 4(hv02.cloud.local),
Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckOnHostCommand":{"host":{"guid":"6ebb3010-9c49-3a9c-b620-ecbc9731aca2-LibvirtComputingResource","privateNetwork":{"ip":"172.16.25.100","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:8e","isSecurityGroupEnabled":false},"publicNetwork":\{"ip":"172.16.25.100","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:8e","isSecurityGroupEnabled":false},"storageNetwork1":\{"ip":"172.16.25.100","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:8e","isSecurityGroupEnabled":false}},"wait":20}}]
} 2018-01-16 17:00:15,423 DEBUG [c.c.a.t.Request] (AgentManager-Handler-4:null) (logid:) Seq
5-9115285645797884785: Processing:  \{ Ans: , MgmtId: 161334379813, via: 5, Ver: v1, Flags:
10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"Heart is beating...","wait":0}}]
} 2018-01-16 17:00:15,423 DEBUG [c.c.a.t.Request] (pool-2-thread-27:null) (logid:6b21a8c1)
Seq 5-9115285645797884785: Received:  \{ Ans: , MgmtId: 161334379813, via: 5(hv03.cloud.local),
Ver: v1, Flags: 10, { Answer } } 2018-01-16 17:00:15,423 DEBUG [c.c.a.m.AgentManagerImpl]
(pool-2-thread-27:null) (logid:6b21a8c1) Details from executing class com.cloud.agent.api.CheckOnHostCommand:
Heart is beating... 2018-01-16 17:00:15,427 DEBUG [c.c.a.t.Request] (AgentManager-Handler-6:null)
(logid:) Seq 4-6582855280332112812: Processing:  \{ Ans: , MgmtId: 161334379813, via: 4,
Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"Heart is beating...","wait":0}}]
} 2018-01-16 17:00:15,427 DEBUG [c.c.a.t.Request] (pool-2-thread-5:null) (logid:bb993597)
Seq 4-6582855280332112812: Received:  \{ Ans: , MgmtId: 161334379813, via: 4(hv02.cloud.local),
Ver: v1, Flags: 10, { Answer } } 2018-01-16 17:00:15,427 DEBUG [c.c.a.m.AgentManagerImpl]
(pool-2-thread-5:null) (logid:bb993597) Details from executing class com.cloud.agent.api.CheckOnHostCommand:
Heart is beating... 2018-01-16 17:00:16,217 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-d9c2c841)
(logid:1b093681) Begin cleanup expired async-jobs 2018-01-16 17:00:16,218 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl]
(AsyncJobMgr-Heartbeat-1:ctx-d9c2c841) (logid:1b093681) End cleanup expired async-jobs 2018-01-16
17:00:17,392 WARN  [o.a.c.o.PowerOperationTask] (pool-6-thread-29:null) (logid:f9788c38)
Out-of-band management background task operation=STATUS for host id=1 failed with: Out-of-band
Management action (STATUS) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed with error:
Get Auth Capabilities error Error issuing Get Channel Authentication Capabilities request
Error: Unable to establish IPMI v2 / RMCP+ session 2018-01-16 17:00:17,422 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl]
(pool-5-thread-6:ctx-65225bcc) (logid:665de20f) Out-of-band Management action (OFF) on host
(57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed with error: Get Auth Capabilities error Error
issuing Get Channel Authentication Capabilities request Error: Unable to establish IPMI v2
/ RMCP+ session 2018-01-16 17:00:17,438 WARN  [o.a.c.k.h.KVMHAProvider] (pool-5-thread-6:ctx-65225bcc)
(logid:665de20f) OOBM service is not configured or enabled for this host hv01.cloud.local
error is Out-of-band Management action (OFF) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125)
failed with error: Get Auth Capabilities error Error issuing Get Channel Authentication Capabilities
request Error: Unable to establish IPMI v2 / RMCP+ session 2018-01-16 17:00:17,438 WARN 
[o.a.c.h.t.BaseHATask] (pool-5-thread-9:null) (logid:ff44841a) Exception occurred while running
FenceTask on a resource: org.apache.cloudstack.ha.provider.HAFenceException: OOBM service
is not configured or enabled for this host hv01.cloud.local org.apache.cloudstack.ha.provider.HAFenceException:
OOBM service is not configured or enabled for this host hv01.cloud.local     at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
    at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)     at
org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)     at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
    at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748) Caused by: com.cloud.utils.exception.CloudRuntimeException:
Out-of-band Management action (OFF) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed
with error: Get Auth Capabilities error Error issuing Get Channel Authentication Capabilities
request Error: Unable to establish IPMI v2 / RMCP+ session     at org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
    at sun.reflect.GeneratedMethodAccessor199.invoke(Unknown Source)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    ... 21 more 2018-01-16 17:00:17,439 WARN  [o.a.c.alerts] (pool-5-thread-9:null) (logid:ff44841a)
AlertType:: 30 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: HA Fencing of
host id=1, in dc id=1 performed 2018-01-16 17:00:17,903 DEBUG [o.a.c.s.SecondaryStorageManagerImpl]
(secstorage-1:ctx-ccb33721) (logid:722404aa) Zone 1 is ready to launch secondary storage VM
2018-01-16 17:00:17,935 DEBUG [c.c.c.ConsoleProxyManagerImpl] (consoleproxy-1:ctx-22a69a02)
(logid:393fab21) Zone 1 is ready to launch console proxy
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message