cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoff Higginbottom <geoff.higginbot...@shapeblue.com>
Subject RE: Xen Host failure in pool
Date Fri, 10 Aug 2012 22:30:39 GMT
Nik,

Have your tried running the following command

xe pool-emergency-transition-to-master

I've had a similar problem a few times, and this always resolves it.

Regards

Geoff


-----Original Message-----
From: Nik Martin [mailto:nik.martin@nfinausa.com]
Sent: 10 August 2012 23:01
To: cloudstack-users@incubator.apache.org
Subject: Re: Xen Host failure in pool

On 08/10/2012 04:48 PM, Caleb Call wrote:
> I've had this happen before and was unable to recover from it.  I eventually had to just
rebuild my box.
>
> This doc may provide some help (I found it after my incident)
>
> http://support.citrix.com/servlet/KbServlet/download/17140-102-18520/XenServer%20System%20Recovery%20Guide.pdf
>
Thanks.  I have exhausted every xe command under the sun, and it appears
I will have to rebuild the server from scratch.  Before I do, do any
Cloudstack developers need/want to take a look at my controller or
Xenserver?  I have no clue why the server would have just disappeared,
so if there is any logs that may help, I'll be glad to supply them.

I know I can't have Hypervisors just disappearing in the middle of the
night forcing them to be rebuilt each time this happens!

Nik

>
> On Aug 10, 2012, at 11:59 AM, Anthony Xu <Xuefei.Xu@citrix.com> wrote:
>
>> Hi Nik
>>
>> What's the network configuration in XenServer host?
>> Is it bridge or openvswitch?
>>
>> You can get the info by
>> Cat /etc/xensource/network.conf
>>
>> Anthony
>>
>>> -----Original Message-----
>>> From: Nik Martin [mailto:nik.martin@nfinausa.com]
>>> Sent: Friday, August 10, 2012 8:36 AM
>>> To: cloudstack-users@incubator.apache.org
>>> Subject: Re: Xen Host failure in pool
>>>
>>> On 08/10/2012 10:32 AM, Mice Xia wrote:
>>>>
>>>> I remember when network partition happens, pool slave may enter
>>> emergency mode and show offline as it could not reach its master for a
>>> long time.
>>>> Could you check hv1's console (graphical console, not ssh console),
>>> and check if its nics are shown correctly?
>>>>
>>>> Regards
>>>> Mice
>>>>
>>> No, when I went into the xsconsole and tried to review all the settings,
>>> it was not showing the management interfaces properly.
>>>
>>> --
>>> Regards,
>>>
>>> Nik
>>>
>>>> -----Original Message-----
>>>> From: Nik Martin [mailto:nik.martin@nfinausa.com]
>>>> Sent: 2012-8-10 (星期五) 23:04
>>>> To: cloudstack-users@incubator.apache.org
>>>> Subject: Xen Host failure in pool
>>>>
>>>> We have a Xenserver 6.2 based pool of three hosts running under
>>>> CloudStack Acton release (code base is about two weeks old).  We left
>>>> last night and everything was fine, and I have about 2 VMs running on
>>>> each host, not doing anything. This morning, I came in, and three VMs
>>>> have stopped, and I logged into XenCenter to see what the pool looked
>>>> like, and the Pol master hd changed from host HV3 to HV2, and HV1 was
>>>> offline.  I logged in to HV1's console, and looked at the
>>>> /var/log/messages, and it was complaining about the pool master
>>> address
>>>> being wrong. I went into CloudStack UI and deleted and re-added the
>>>> host, and it failed immediately, and I got this in the log when I did:
>>>>
>>>>
>>>> 2012-08-10 09:56:39,566 DEBUG [cloud.api.ApiServlet]
>>>> (catalina-exec-24:null) Invalid paramemter in URL found. param:
>>> hosttags=
>>>> 2012-08-10 09:56:39,573 INFO  [cloud.resource.ResourceManagerImpl]
>>>> (catalina-exec-24:null) Trying to add a new host at http://172.16.5.3
>>> in
>>>> data center 2
>>>> 2012-08-10 09:56:39,629 DEBUG [xen.resource.XenServerConnectionPool]
>>>> (catalina-exec-24:null) Slave logon to 172.16.5.3
>>>> 2012-08-10 09:56:39,632 DEBUG [xen.resource.XenServerConnectionPool]
>>>> (catalina-exec-24:null) Failed to slave local login to 172.16.5.3 due
>>> to
>>>> The master says the host is not known to it. Perhaps the Host was
>>>> deleted from the master's database? Perhaps the slave is pointing to
>>> the
>>>> wrong master?
>>>> 2012-08-10 09:56:39,638 DEBUG [xen.discoverer.XcpServerDiscoverer]
>>>> (catalina-exec-24:null) other exceptions: java.lang.RuntimeException:
>>>> can not get master ip
>>>> java.lang.RuntimeException: can not get master ip
>>>>    at
>>>>
>>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool.getMasterIp(X
>>> enServerConnectionPool.java:343)
>>>>    at
>>>>
>>> com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer.find(XcpServerD
>>> iscoverer.java:179)
>>>>    at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage
>>> rImpl.java:644)
>>>>    at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp
>>> l.java:514)
>>>>    at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
>>>>    at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
>>>>    at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
>>>>    at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
>>>>    at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
>>>>    at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
>>>>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>>>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>>    at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic
>>> ationFilterChain.java:290)
>>>>    at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil
>>> terChain.java:206)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal
>>> ve.java:233)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal
>>> ve.java:191)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav
>>> a:127)
>>>>    at
>>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav
>>> a:102)
>>>>    at
>>>>
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55
>>> 5)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve
>>> .java:109)
>>>>    at
>>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
>>> 298)
>>>>    at
>>>>
>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.
>>> java:889)
>>>>    at
>>>>
>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc
>>> ess(Http11NioProtocol.java:721)
>>>>    at
>>>>
>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.
>>> java:2268)
>>>>    at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>>> va:1110)
>>>>    at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>>> ava:603)
>>>>    at java.lang.Thread.run(Thread.java:679)
>>>> 2012-08-10 09:56:39,638 WARN  [cloud.resource.ResourceManagerImpl]
>>>> (catalina-exec-24:null) Unable to find the server resources at
>>>> http://172.16.5.3
>>>> 2012-08-10 09:56:39,642 WARN  [api.commands.AddHostCmd]
>>>> (catalina-exec-24:null) Exception:
>>>> com.cloud.exception.DiscoveryException: Unable to add the host
>>>>    at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage
>>> rImpl.java:694)
>>>>    at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp
>>> l.java:514)
>>>>    at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
>>>>    at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
>>>>    at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
>>>>    at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
>>>>    at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
>>>>    at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
>>>>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>>>    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>>    at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic
>>> ationFilterChain.java:290)
>>>>    at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil
>>> terChain.java:206)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal
>>> ve.java:233)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal
>>> ve.java:191)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav
>>> a:127)
>>>>    at
>>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav
>>> a:102)
>>>>    at
>>>>
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55
>>> 5)
>>>>    at
>>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve
>>> .java:109)
>>>>    at
>>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
>>> 298)
>>>>    at
>>>>
>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.
>>> java:889)
>>>>    at
>>>>
>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc
>>> ess(Http11NioProtocol.java:721)
>>>>    at
>>>>
>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.
>>> java:2268)
>>>>    at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>>> va:1110)
>>>>    at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>>> ava:603)
>>>>    at java.lang.Thread.run(Thread.java:679)
>>>> 2012-08-10 09:56:39,642 WARN  [cloud.api.ApiDispatcher]
>>>> (catalina-exec-24:null) class com.cloud.api.ServerApiException :
>>> Unable
>>>> to add the host
>>>> 2012-08-10 09:56:39,723 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-305:null) Ping from 17
>>>> 2012-08-10 09:56:43,822 DEBUG
>>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null)
>>> Zone
>>>> 2 is ready to launch secondary storage VM
>>>> 2012-08-10 09:56:43,916 DEBUG
>>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null)
>>> Zone
>>>> 2 is ready to launch console proxy
>>>> 2012-08-10 09:56:44,102 DEBUG
>>>> [network.router.VirtualNetworkApplianceManagerImpl]
>>>> (RouterStatusMonitor-1:null) Found 2 routers.
>>>> 2012-08-10 09:56:44,614 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-12:null) Ping from 22
>>>> 2012-08-10 09:56:48,864 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-10:null) Ping from 18
>>>> 2012-08-10 09:56:49,511 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) VmStatsCollector is running...
>>>> 2012-08-10 09:56:49,525 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-305:null) Seq 16-92408948: Executing request
>>>> 2012-08-10 09:56:49,763 DEBUG [xen.resource.CitrixResourceBase]
>>>> (DirectAgent-305:null) Vm cpu utilization 0.01
>>>> 2012-08-10 09:56:49,763 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-305:null) Seq 16-92408948: Response Received:
>>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (DirectAgent-305:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:56:49,763 DEBUG [agent.transport.Request]
>>>> (StatsCollector-1:null) Seq 16-92408948: Received:  { Ans: , MgmtId:
>>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetVmStatsAnswer } }
>>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (StatsCollector-1:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:56:54,411 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-497:null) Ping from 17
>>>> 2012-08-10 09:56:54,550 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Ping from 16
>>>> 2012-08-10 09:56:59,614 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-8:null) Ping from 22
>>>> 2012-08-10 09:57:03,864 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-9:null) Ping from 18
>>>> 2012-08-10 09:57:09,551 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Ping from 16
>>>> 2012-08-10 09:57:09,669 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Ping from 17
>>>> 2012-08-10 09:57:13,821 DEBUG
>>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null)
>>> Zone
>>>> 2 is ready to launch secondary storage VM
>>>> 2012-08-10 09:57:13,918 DEBUG
>>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null)
>>> Zone
>>>> 2 is ready to launch console proxy
>>>> 2012-08-10 09:57:14,102 DEBUG
>>>> [network.router.VirtualNetworkApplianceManagerImpl]
>>>> (RouterStatusMonitor-1:null) Found 2 routers.
>>>> 2012-08-10 09:57:14,614 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-11:null) Ping from 22
>>>> 2012-08-10 09:57:15,645 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-3:null) HostStatsCollector is running...
>>>> 2012-08-10 09:57:15,656 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Seq 16-92408949: Executing request
>>>> 2012-08-10 09:57:15,878 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Seq 16-92408949: Response Received:
>>>> 2012-08-10 09:57:15,878 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (DirectAgent-71:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:15,878 DEBUG [agent.transport.Request]
>>>> (StatsCollector-3:null) Seq 16-92408949: Received:  { Ans: , MgmtId:
>>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
>>>> 2012-08-10 09:57:15,879 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (StatsCollector-3:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:15,884 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Seq 17-665190891: Executing request
>>>> 2012-08-10 09:57:16,312 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Seq 17-665190891: Response Received:
>>>> 2012-08-10 09:57:16,312 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (DirectAgent-338:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:16,312 DEBUG [agent.transport.Request]
>>>> (StatsCollector-3:null) Seq 17-665190891: Received:  { Ans: , MgmtId:
>>>> 130577622632, via: 17, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
>>>> 2012-08-10 09:57:16,313 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (StatsCollector-3:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:18,864 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-15:null) Ping from 18
>>>> 2012-08-10 09:57:24,407 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Ping from 17
>>>> 2012-08-10 09:57:24,566 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Ping from 16
>>>> 2012-08-10 09:57:29,615 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-1:null) Ping from 22
>>>> 2012-08-10 09:57:30,047 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-294:null) Seq 16-92405762: Executing request
>>>> 2012-08-10 09:57:30,308 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-294:null) Seq 16-92405762: Response Received:
>>>> 2012-08-10 09:57:30,308 DEBUG [agent.transport.Request]
>>>> (DirectAgent-294:null) Seq 16-92405762: Processing:  { Ans: , MgmtId:
>>>> 130577622632, via: 16, Ver: v1, Flags: 10,
>>>>
>>> [{"ClusterSyncAnswer":{"_clusterId":1,"_newStates":{},"_isExecuted":fal
>>> se,"result":true,"wait":0}}]
>>>> }
>>>> 2012-08-10 09:57:31,060 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-357:null) Seq 17-665190402: Executing request
>>>> 2012-08-10 09:57:31,250 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-357:null) Seq 17-665190402: Response Received:
>>>> 2012-08-10 09:57:31,250 DEBUG [agent.transport.Request]
>>>> (DirectAgent-357:null) Seq 17-665190402: Processing:  { Ans: , MgmtId:
>>>> 130577622632, via: 17, Ver: v1, Flags: 10,
>>>> [{"Answer":{"result":true,"wait":0}}] }
>>>>
>>>> This is a very serious error, and I don't know how to fix it.  Can
>>>> anyone suggest what might be the problem and hos I might fix it?
>>>>
>>>>
>>>
>>>
>>
>


--
Regards,

Nik


ShapeBlue provides a range of strategic and technical consulting and implementation services
to help IT Service Providers and Enterprises to build a true IaaS compute cloud. ShapeBlue’s
expertise, combined with CloudStack technology, allows IT Service Providers and Enterprises
to deliver true, utility based, IaaS to the customer or end-user.

________________________________

This email and any attachments to it may be confidential and are intended solely for the use
of the individual to whom it is addressed. Any views or opinions expressed are solely those
of the author and do not necessarily represent those of Shape Blue Ltd. If you are not the
intended recipient of this email, you must neither take any action based upon its contents,
nor copy or show it to anyone. Please contact the sender if you believe you have received
this email in error. Shape Blue Ltd is a company incorporated in England & Wales.


Mime
View raw message