cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nik Martin <nik.mar...@nfinausa.com>
Subject Re: Xen Host failure in pool
Date Fri, 10 Aug 2012 22:01:16 GMT
On 08/10/2012 04:48 PM, Caleb Call wrote:
> I've had this happen before and was unable to recover from it.  I eventually had to just
rebuild my box.
> 
> This doc may provide some help (I found it after my incident)
> 
> http://support.citrix.com/servlet/KbServlet/download/17140-102-18520/XenServer%20System%20Recovery%20Guide.pdf
> 
Thanks.  I have exhausted every xe command under the sun, and it appears
I will have to rebuild the server from scratch.  Before I do, do any
Cloudstack developers need/want to take a look at my controller or
Xenserver?  I have no clue why the server would have just disappeared,
so if there is any logs that may help, I'll be glad to supply them.

I know I can't have Hypervisors just disappearing in the middle of the
night forcing them to be rebuilt each time this happens!

Nik

> 
> On Aug 10, 2012, at 11:59 AM, Anthony Xu <Xuefei.Xu@citrix.com> wrote:
> 
>> Hi Nik
>>
>> What's the network configuration in XenServer host?
>> Is it bridge or openvswitch?
>>
>> You can get the info by
>> Cat /etc/xensource/network.conf
>>
>> Anthony
>>
>>> -----Original Message-----
>>> From: Nik Martin [mailto:nik.martin@nfinausa.com]
>>> Sent: Friday, August 10, 2012 8:36 AM
>>> To: cloudstack-users@incubator.apache.org
>>> Subject: Re: Xen Host failure in pool
>>>
>>> On 08/10/2012 10:32 AM, Mice Xia wrote:
>>>>
>>>> I remember when network partition happens, pool slave may enter
>>> emergency mode and show offline as it could not reach its master for a
>>> long time.
>>>> Could you check hv1's console (graphical console, not ssh console),
>>> and check if its nics are shown correctly?
>>>>
>>>> Regards
>>>> Mice
>>>>
>>> No, when I went into the xsconsole and tried to review all the settings,
>>> it was not showing the management interfaces properly.
>>>
>>> --
>>> Regards,
>>>
>>> Nik
>>>
>>>> -----Original Message-----
>>>> From: Nik Martin [mailto:nik.martin@nfinausa.com]
>>>> Sent: 2012-8-10 (ζ˜ŸζœŸδΊ”) 23:04
>>>> To: cloudstack-users@incubator.apache.org
>>>> Subject: Xen Host failure in pool
>>>>
>>>> We have a Xenserver 6.2 based pool of three hosts running under
>>>> CloudStack Acton release (code base is about two weeks old).  We left
>>>> last night and everything was fine, and I have about 2 VMs running on
>>>> each host, not doing anything. This morning, I came in, and three VMs
>>>> have stopped, and I logged into XenCenter to see what the pool looked
>>>> like, and the Pol master hd changed from host HV3 to HV2, and HV1 was
>>>> offline.  I logged in to HV1's console, and looked at the
>>>> /var/log/messages, and it was complaining about the pool master
>>> address
>>>> being wrong. I went into CloudStack UI and deleted and re-added the
>>>> host, and it failed immediately, and I got this in the log when I did:
>>>>
>>>>
>>>> 2012-08-10 09:56:39,566 DEBUG [cloud.api.ApiServlet]
>>>> (catalina-exec-24:null) Invalid paramemter in URL found. param:
>>> hosttags=
>>>> 2012-08-10 09:56:39,573 INFO  [cloud.resource.ResourceManagerImpl]
>>>> (catalina-exec-24:null) Trying to add a new host at http://172.16.5.3
>>> in
>>>> data center 2
>>>> 2012-08-10 09:56:39,629 DEBUG [xen.resource.XenServerConnectionPool]
>>>> (catalina-exec-24:null) Slave logon to 172.16.5.3
>>>> 2012-08-10 09:56:39,632 DEBUG [xen.resource.XenServerConnectionPool]
>>>> (catalina-exec-24:null) Failed to slave local login to 172.16.5.3 due
>>> to
>>>> The master says the host is not known to it. Perhaps the Host was
>>>> deleted from the master's database? Perhaps the slave is pointing to
>>> the
>>>> wrong master?
>>>> 2012-08-10 09:56:39,638 DEBUG [xen.discoverer.XcpServerDiscoverer]
>>>> (catalina-exec-24:null) other exceptions: java.lang.RuntimeException:
>>>> can not get master ip
>>>> java.lang.RuntimeException: can not get master ip
>>>> 	at
>>>>
>>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool.getMasterIp(X
>>> enServerConnectionPool.java:343)
>>>> 	at
>>>>
>>> com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer.find(XcpServerD
>>> iscoverer.java:179)
>>>> 	at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage
>>> rImpl.java:644)
>>>> 	at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp
>>> l.java:514)
>>>> 	at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
>>>> 	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
>>>> 	at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
>>>> 	at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
>>>> 	at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
>>>> 	at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
>>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>> 	at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic
>>> ationFilterChain.java:290)
>>>> 	at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil
>>> terChain.java:206)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal
>>> ve.java:233)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal
>>> ve.java:191)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav
>>> a:127)
>>>> 	at
>>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav
>>> a:102)
>>>> 	at
>>>>
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55
>>> 5)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve
>>> .java:109)
>>>> 	at
>>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
>>> 298)
>>>> 	at
>>>>
>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.
>>> java:889)
>>>> 	at
>>>>
>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc
>>> ess(Http11NioProtocol.java:721)
>>>> 	at
>>>>
>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.
>>> java:2268)
>>>> 	at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>>> va:1110)
>>>> 	at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>>> ava:603)
>>>> 	at java.lang.Thread.run(Thread.java:679)
>>>> 2012-08-10 09:56:39,638 WARN  [cloud.resource.ResourceManagerImpl]
>>>> (catalina-exec-24:null) Unable to find the server resources at
>>>> http://172.16.5.3
>>>> 2012-08-10 09:56:39,642 WARN  [api.commands.AddHostCmd]
>>>> (catalina-exec-24:null) Exception:
>>>> com.cloud.exception.DiscoveryException: Unable to add the host
>>>> 	at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage
>>> rImpl.java:694)
>>>> 	at
>>>>
>>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp
>>> l.java:514)
>>>> 	at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
>>>> 	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
>>>> 	at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
>>>> 	at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
>>>> 	at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
>>>> 	at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
>>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>> 	at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic
>>> ationFilterChain.java:290)
>>>> 	at
>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil
>>> terChain.java:206)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal
>>> ve.java:233)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal
>>> ve.java:191)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav
>>> a:127)
>>>> 	at
>>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav
>>> a:102)
>>>> 	at
>>>>
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55
>>> 5)
>>>> 	at
>>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve
>>> .java:109)
>>>> 	at
>>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
>>> 298)
>>>> 	at
>>>>
>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.
>>> java:889)
>>>> 	at
>>>>
>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc
>>> ess(Http11NioProtocol.java:721)
>>>> 	at
>>>>
>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.
>>> java:2268)
>>>> 	at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>>> va:1110)
>>>> 	at
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>>> ava:603)
>>>> 	at java.lang.Thread.run(Thread.java:679)
>>>> 2012-08-10 09:56:39,642 WARN  [cloud.api.ApiDispatcher]
>>>> (catalina-exec-24:null) class com.cloud.api.ServerApiException :
>>> Unable
>>>> to add the host
>>>> 2012-08-10 09:56:39,723 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-305:null) Ping from 17
>>>> 2012-08-10 09:56:43,822 DEBUG
>>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null)
>>> Zone
>>>> 2 is ready to launch secondary storage VM
>>>> 2012-08-10 09:56:43,916 DEBUG
>>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null)
>>> Zone
>>>> 2 is ready to launch console proxy
>>>> 2012-08-10 09:56:44,102 DEBUG
>>>> [network.router.VirtualNetworkApplianceManagerImpl]
>>>> (RouterStatusMonitor-1:null) Found 2 routers.
>>>> 2012-08-10 09:56:44,614 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-12:null) Ping from 22
>>>> 2012-08-10 09:56:48,864 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-10:null) Ping from 18
>>>> 2012-08-10 09:56:49,511 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) VmStatsCollector is running...
>>>> 2012-08-10 09:56:49,525 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-305:null) Seq 16-92408948: Executing request
>>>> 2012-08-10 09:56:49,763 DEBUG [xen.resource.CitrixResourceBase]
>>>> (DirectAgent-305:null) Vm cpu utilization 0.01
>>>> 2012-08-10 09:56:49,763 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-305:null) Seq 16-92408948: Response Received:
>>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (DirectAgent-305:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:56:49,763 DEBUG [agent.transport.Request]
>>>> (StatsCollector-1:null) Seq 16-92408948: Received:  { Ans: , MgmtId:
>>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetVmStatsAnswer } }
>>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (StatsCollector-1:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:56:54,411 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-497:null) Ping from 17
>>>> 2012-08-10 09:56:54,550 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Ping from 16
>>>> 2012-08-10 09:56:59,614 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-8:null) Ping from 22
>>>> 2012-08-10 09:57:03,864 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-9:null) Ping from 18
>>>> 2012-08-10 09:57:09,551 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Ping from 16
>>>> 2012-08-10 09:57:09,669 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Ping from 17
>>>> 2012-08-10 09:57:13,821 DEBUG
>>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null)
>>> Zone
>>>> 2 is ready to launch secondary storage VM
>>>> 2012-08-10 09:57:13,918 DEBUG
>>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null)
>>> Zone
>>>> 2 is ready to launch console proxy
>>>> 2012-08-10 09:57:14,102 DEBUG
>>>> [network.router.VirtualNetworkApplianceManagerImpl]
>>>> (RouterStatusMonitor-1:null) Found 2 routers.
>>>> 2012-08-10 09:57:14,614 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-11:null) Ping from 22
>>>> 2012-08-10 09:57:15,645 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-3:null) HostStatsCollector is running...
>>>> 2012-08-10 09:57:15,656 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Seq 16-92408949: Executing request
>>>> 2012-08-10 09:57:15,878 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Seq 16-92408949: Response Received:
>>>> 2012-08-10 09:57:15,878 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (DirectAgent-71:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:15,878 DEBUG [agent.transport.Request]
>>>> (StatsCollector-3:null) Seq 16-92408949: Received:  { Ans: , MgmtId:
>>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
>>>> 2012-08-10 09:57:15,879 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (StatsCollector-3:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:15,884 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Seq 17-665190891: Executing request
>>>> 2012-08-10 09:57:16,312 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Seq 17-665190891: Response Received:
>>>> 2012-08-10 09:57:16,312 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (DirectAgent-338:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:16,312 DEBUG [agent.transport.Request]
>>>> (StatsCollector-3:null) Seq 17-665190891: Received:  { Ans: , MgmtId:
>>>> 130577622632, via: 17, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
>>>> 2012-08-10 09:57:16,313 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>>> (StatsCollector-3:null) Cleanup succeeded. Details null
>>>> 2012-08-10 09:57:18,864 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-15:null) Ping from 18
>>>> 2012-08-10 09:57:24,407 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-71:null) Ping from 17
>>>> 2012-08-10 09:57:24,566 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-338:null) Ping from 16
>>>> 2012-08-10 09:57:29,615 DEBUG [agent.manager.AgentManagerImpl]
>>>> (AgentManager-Handler-1:null) Ping from 22
>>>> 2012-08-10 09:57:30,047 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-294:null) Seq 16-92405762: Executing request
>>>> 2012-08-10 09:57:30,308 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-294:null) Seq 16-92405762: Response Received:
>>>> 2012-08-10 09:57:30,308 DEBUG [agent.transport.Request]
>>>> (DirectAgent-294:null) Seq 16-92405762: Processing:  { Ans: , MgmtId:
>>>> 130577622632, via: 16, Ver: v1, Flags: 10,
>>>>
>>> [{"ClusterSyncAnswer":{"_clusterId":1,"_newStates":{},"_isExecuted":fal
>>> se,"result":true,"wait":0}}]
>>>> }
>>>> 2012-08-10 09:57:31,060 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-357:null) Seq 17-665190402: Executing request
>>>> 2012-08-10 09:57:31,250 DEBUG [agent.manager.DirectAgentAttache]
>>>> (DirectAgent-357:null) Seq 17-665190402: Response Received:
>>>> 2012-08-10 09:57:31,250 DEBUG [agent.transport.Request]
>>>> (DirectAgent-357:null) Seq 17-665190402: Processing:  { Ans: , MgmtId:
>>>> 130577622632, via: 17, Ver: v1, Flags: 10,
>>>> [{"Answer":{"result":true,"wait":0}}] }
>>>>
>>>> This is a very serious error, and I don't know how to fix it.  Can
>>>> anyone suggest what might be the problem and hos I might fix it?
>>>>
>>>>
>>>
>>>
>>
> 


-- 
Regards,

Nik


Mime
View raw message