Return-Path: X-Original-To: apmail-cloudstack-issues-archive@www.apache.org Delivered-To: apmail-cloudstack-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA2D110FD3 for ; Mon, 29 Jul 2013 05:23:55 +0000 (UTC) Received: (qmail 83940 invoked by uid 500); 29 Jul 2013 05:23:55 -0000 Delivered-To: apmail-cloudstack-issues-archive@cloudstack.apache.org Received: (qmail 83781 invoked by uid 500); 29 Jul 2013 05:23:55 -0000 Mailing-List: contact issues-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list issues@cloudstack.apache.org Received: (qmail 82509 invoked by uid 500); 29 Jul 2013 05:23:50 -0000 Delivered-To: apmail-incubator-cloudstack-issues@incubator.apache.org Received: (qmail 82488 invoked by uid 99); 29 Jul 2013 05:23:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jul 2013 05:23:49 +0000 Date: Mon, 29 Jul 2013 05:23:49 +0000 (UTC) From: "Koushik Das (JIRA)" To: cloudstack-issues@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (CLOUDSTACK-2428) HA - When the master host is disconnected , the host status contines to remain in "Up" state because of com.cloud.utils.exception.CloudRuntimeException: Unable to reset master of slave MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CLOUDSTACK-2428?page=3Dcom.atl= assian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koushik Das resolved CLOUDSTACK-2428. ------------------------------------- Resolution: Cannot Reproduce =20 > HA - When the master host is disconnected , the host status contines to r= emain in "Up" state because of com.cloud.utils.exception.CloudRuntimeExcept= ion: Unable to reset master of slave=20 > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= ------------------------------------- > > Key: CLOUDSTACK-2428 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-242= 8 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the defa= ult.)=20 > Components: Management Server > Affects Versions: 4.2.0 > Environment: Build from pvaln > Reporter: Sangeetha Hariharan > Assignee: Koushik Das > Priority: Critical > Fix For: 4.2.0 > > Attachments: logs_7_29, logs.rar > > > 1. Advance zone=C2=A0 with 1 cluster with 2 hosts. Create=C2=A0 Shared ne= twork with private vlan. > 2. Deploy few HA enabled Vms in this network.=20 > 3. pull network cable for one of the host. > When cloudstack detects that the host is disconnected , it is not able to= out the host in disconnected state and start HA for Vms that are HA enabel= d, > I see the following exception in the management server logs: > 2013-05-09 17:15:55,576 DEBUG [agent.manager.DirectAgentAttache] (DirectA= gent-267:null) Seq 1-1435828229: Executing request > 2013-05-09 17:15:55,602 DEBUG [xen.resource.XenServerConnectionPool] (Dir= ectAgent-267:null) Catch Exception: com.xensource.xenapi.Types$HostOffline = Host is offline 10.223.81.62 due to You attempted an operation which involv= es a host which could not be contacted. > 2013-05-09 17:15:55,603 DEBUG [xen.resource.XenServerConnectionPool] (Dir= ectAgent-267:null) Trying to reset master of slave 10.223.81.62 to 10.223.8= 1.61 > 2013-05-09 17:16:02,319 WARN [xen.resource.CitrixResourceBase] (DirectAg= ent-265:null) can not ping xenserver 520d4994-8b1f-4dda-b51d-2ee63750abf6 > 2013-05-09 17:16:02,319 WARN [agent.manager.DirectAgentAttache] (DirectA= gent-265:null) Unable to get current status on 1 > 2013-05-09 17:16:02,321 INFO [agent.manager.AgentManagerImpl] (AgentTask= Pool-11:null) Investigating why host 1 has disconnected with event AgentDis= connected > 2013-05-09 17:16:02,321 DEBUG [agent.manager.AgentManagerImpl] (AgentTask= Pool-11:null) checking if agent (1) is alive > 2013-05-09 17:16:02,323 DEBUG [agent.transport.Request] (AgentTaskPool-11= :null) Seq 1-1435828294: Sending { Cmd , MgmtId: 7647994577963, via: 1, Ve= r: v1, Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] } > 2013-05-09 17:16:02,323 DEBUG [agent.transport.Request] (AgentTaskPool-11= :null) Seq 1-1435828294: Executing: { Cmd , MgmtId: 7647994577963, via: 1,= Ver: v1, Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] } > 2013-05-09 17:16:02,323 DEBUG [agent.manager.DirectAgentAttache] (DirectA= gent-271:null) Seq 1-1435828294: Executing request > 2013-05-09 17:16:04,035 DEBUG [agent.manager.AgentAttache] (AgentTaskPool= -10:null) Seq 6-474349576: Waiting some more time because this is the curre= nt command > 2013-05-09 17:16:04,040 DEBUG [xen.resource.XenServerConnectionPool] (Dir= ectAgent-268:null) localLogout has problem Failed to read server's response= : connect timed out > 2013-05-09 17:16:04,040 WARN [agent.manager.DirectAgentAttache] (DirectA= gent-268:null) Seq 1-1435828292: Exception Caught while executing command > com.cloud.utils.exception.CloudRuntimeException: Unable to reset master o= f slave 10.223.81.62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcExcepti= on: Failed to read server's response: connect timed out > at com.cloud.hypervisor.xen.resource.XenServerConnectionPool.Pool= EmergencyResetMaster(XenServerConnectionPool.java:443) > at com.cloud.hypervisor.xen.resource.XenServerConnectionPool.conn= ect(XenServerConnectionPool.java:661) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.getConnec= tion(CitrixResourceBase.java:5639) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.execute(C= itrixResourceBase.java:1682) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRe= quest(CitrixResourceBase.java:524) > at com.cloud.hypervisor.xen.resource.XenServer56Resource.executeR= equest(XenServer56Resource.java:73) > at com.cloud.hypervisor.xen.resource.XenServer610Resource.execute= Request(XenServer610Resource.java:102) > at com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgen= tAttache.java:186) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.= java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:= 334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu= reTask.access$101(ScheduledThreadPoolExecutor.java:165) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu= reTask.run(ScheduledThreadPoolExecutor.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolEx= ecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE= xecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > 2013-05-09 17:16:04,041 DEBUG [agent.manager.DirectAgentAttache] (DirectA= gent-268:null) Seq 1-1435828292: Response Received: > 2013-05-09 17:16:04,041 DEBUG [agent.transport.Request] (DirectAgent-268:= null) Seq 1-1435828292: Processing: { Ans: , MgmtId: 7647994577963, via: 1= , Ver: v1, Flags: 10, [{"Answer":{"result":false,"details":"com.cloud.utils= .exception.CloudRuntimeException: Unable to reset master of slave 10.223.81= .62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcException: Failed to rea= d server's response: connect timed out","wait":0}}] } > 2013-05-09 17:16:04,041 DEBUG [agent.transport.Request] (AgentTaskPool-5:= null) Seq 1-1435828292: Received: { Ans: , MgmtId: 7647994577963, via: 1, = Ver: v1, Flags: 10, { Answer } } > 2013-05-09 17:16:04,041 DEBUG [cloud.ha.AbstractInvestigatorImpl] (AgentT= askPool-5:null) host (10.223.81.50) cannot be pinged, returning null ('I do= n't know') > 2013-05-09 17:16:04,041 DEBUG [cloud.ha.UserVmDomRInvestigator] (AgentTas= kPool-5:null) sending ping from (5) to agent's host ip address (10.223.81.5= 0) > 2013-05-09 17:16:04,043 DEBUG [agent.transport.Request] (AgentTaskPool-5:= null) Seq 5-2082341067: Sending { Cmd , MgmtId: 7647994577963, via: 5, Ver= : v1, Flags: 100011, [{"PingTestCommand":{"_computingHostIp":"10.223.81.50"= ,"wait":20}}] } > 2013-05-09 17:16:04,043 DEBUG [agent.transport.Request] (AgentTaskPool-5:= null) Seq 5-2082341067: Executing: { Cmd , MgmtId: 7647994577963, via: 5, = Ver: v1, Flags: 100011, [{"PingTestCommand":{"_computingHostIp":"10.223.81.= 50","wait":20}}] } > 2013-05-09 17:16:04,043 DEBUG [agent.manager.DirectAgentAttache] (DirectA= gent-272:null) Seq 5-2082341067: Executing request > 2013-05-09 17:16:04,053 DEBUG [xen.resource.XenServerConnectionPool] (Dir= ectAgent-91:null) localLogout has problem Failed to read server's response:= connect timed out > 2013-05-09 17:16:04,053 WARN [agent.manager.DirectAgentAttache] (DirectA= gent-91:null) Seq 1-1435828293: Exception Caught while executing command > com.cloud.utils.exception.CloudRuntimeException: Unable to reset master o= f slave 10.223.81.62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcExcepti= on: Failed to read server's response: connect timed out > at com.cloud.hypervisor.xen.resource.XenServerConnectionPool.Pool= EmergencyResetMaster(XenServerConnectionPool.java:443) > at com.cloud.hypervisor.xen.resource.XenServerConnectionPool.conn= ect(XenServerConnectionPool.java:661) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.getConnec= tion(CitrixResourceBase.java:5639) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.execute(C= itrixResourceBase.java:1682) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRe= quest(CitrixResourceBase.java:524) > at com.cloud.hypervisor.xen.resource.XenServer56Resource.executeR= equest(XenServer56Resource.java:73) > at com.cloud.hypervisor.xen.resource.XenServer610Resource.execute= Request(XenServer610Resource.java:102) > at com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgen= tAttache.java:186) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.= java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:= 334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu= reTask.access$101(ScheduledThreadPoolExecutor.java:165) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu= reTask.run(ScheduledThreadPoolExecutor.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolEx= ecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE= xecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > 2013-05-09 17:16:04,054 DEBUG [agent.manager.DirectAgentAttache] (DirectA= gent-91:null) Seq 1-1435828293: Response Received: > 2013-05-09 17:16:04,054 DEBUG [agent.transport.Request] (DirectAgent-91:n= ull) Seq 1-1435828293: Processing: { Ans: , MgmtId: 7647994577963, via: 1,= Ver: v1, Flags: 10, [{"Answer":{"result":false,"details":"com.cloud.utils.= exception.CloudRuntimeException: Unable to reset master of slave 10.223.81.= 62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcException: Failed to read= server's response: connect timed out","wait":0}}] } > 2013-05-09 17:16:04,055 DEBUG [agent.transport.Request] (AgentTaskPool-7:= null) Seq 1-1435828293: Received: { Ans: , MgmtId: 7647994577963, via: 1, = Ver: v1, Flags: 10, { Answer } } > 2013-05-09 17:16:04,055 DEBUG [cloud.ha.AbstractInvestigatorImpl] (AgentT= askPool-7:null) host (10.223.81.52) cannot be pinged, returning null ('I do= n't know') > 2013-05-09 17:16:04,055 DEBUG [cloud.ha.UserVmDomRInvestigator] (AgentTas= kPool-7:null) sending ping from (5) to agent's host ip address (10.223.81.5= 2) > 2013-05-09 17:16:04,057 DEBUG [agent.transport.Request] (AgentTaskPool-7:= null) Seq 5-2082341068: Sending { Cmd , MgmtId: 7647994577963, via: 5, Ver= : v1, Flags: 100011, [{"PingTestCommand":{"_computingHostIp":"10.223.81.52"= ,"wait":20}}] } > 2013-05-09 17:16:04,057 DEBUG [agent.manager.AgentAttache] (AgentTaskPool= -14:null) Seq 3-1752367195: Waiting some more time because this is the curr= ent command > 2013-05-09 17:16:04,057 DEBUG [agent.transport.Request] (AgentTaskPool-7:= null) Seq 5-2082341068: Executing: { Cmd , MgmtId: 7647994577963, via: 5, = Ver: v1, Flags: 100011, [{"PingTestCommand":{"_computingHostIp":"10.223.81.= 52","wait":20}}] } > 2013-05-09 17:16:04,057 DEBUG [agent.manager.DirectAgentAttache] (DirectA= gent-91:null) Seq 5-2082341068: Executing request > 2013-05-09 17:16:05,175 DEBUG [storage.secondary.SecondaryStorageManagerI= mpl] (secstorage-1:null) Zone 1 is ready to launch secondary storage VM > 2013-05-09 17:16:05,614 DEBUG [xen.resource.XenServerConnectionPool] (Dir= ectAgent-267:null) localLogout has problem Failed to read server's response= : connect timed out > 2013-05-09 17:16:05,614 WARN [agent.manager.DirectAgentAttache] (DirectA= gent-267:null) Seq 1-1435828229: Exception Caught while executing command > com.cloud.utils.exception.CloudRuntimeException: Unable to reset master o= f slave 10.223.81.62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcExcepti= on: Failed to read server's response: connect timed out > at com.cloud.hypervisor.xen.resource.XenServerConnectionPool.Pool= EmergencyResetMaster(XenServerConnectionPool.java:443) > at com.cloud.hypervisor.xen.resource.XenServerConnectionPool.conn= ect(XenServerConnectionPool.java:661) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.getConnec= tion(CitrixResourceBase.java:5639) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.execute(C= itrixResourceBase.java:7725) > at com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRe= quest(CitrixResourceBase.java:570) > at com.cloud.hypervisor.xen.resource.XenServer56Resource.executeR= equest(XenServer56Resource.java:73) > at com.cloud.hypervisor.xen.resource.XenServer610Resource.execute= Request(XenServer610Resource.java:102) > at com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgen= tAttache.java:186) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.= java:471) > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTa= sk.java:351) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:17= 8) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu= reTask.access$201(ScheduledThreadPoolExecutor.java:165) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutu= reTask.run(ScheduledThreadPoolExecutor.java:267) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolEx= ecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolE= xecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > 2013-05-09 17:16:05,615 DEBUG [agent.manager.DirectAgentAttache] (DirectA= gent-267:null) Seq 1-1435828229: Response Received: > 2013-05-09 17:16:05,615 DEBUG [agent.transport.Request] (DirectAgent-267:= null) Seq 1-1435828229: Processing: { Ans: , MgmtId: 7647994577963, via: 1= , Ver: v1, Flags: 10, [{"Answer":{"result":false,"details":"com.cloud.utils= .exception.CloudRuntimeException: Unable to reset master of slave 10.223.81= .62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcException: Failed to rea= d server's response: connect timed out","wait":0}}] } > 2013-05-09 17:16:05,704 DEB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira