cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nik Martin <nik.mar...@nfinausa.com>
Subject Re: Xen Host failure in pool
Date Fri, 10 Aug 2012 18:37:37 GMT
On 08/10/2012 12:59 PM, Anthony Xu wrote:
> Hi Nik
> 
> What's the network configuration in XenServer host?
> Is it bridge or openvswitch?
> 
Anthony,

I'm using cloudstack advanced networking, so the network is openvswitch.
I did the emergency network reset, and the Xenserver rebooted, and when
it came back, it was the same, no nics being reported.

-- 
Regards,

Nik


> You can get the info by
> Cat /etc/xensource/network.conf
> 
> Anthony
> 
>> -----Original Message-----
>> From: Nik Martin [mailto:nik.martin@nfinausa.com]
>> Sent: Friday, August 10, 2012 8:36 AM
>> To: cloudstack-users@incubator.apache.org
>> Subject: Re: Xen Host failure in pool
>>
>> On 08/10/2012 10:32 AM, Mice Xia wrote:
>>>
>>> I remember when network partition happens, pool slave may enter
>> emergency mode and show offline as it could not reach its master for a
>> long time.
>>> Could you check hv1's console (graphical console, not ssh console),
>> and check if its nics are shown correctly?
>>>
>>> Regards
>>> Mice
>>>
>> No, when I went into the xsconsole and tried to review all the settings,
>> it was not showing the management interfaces properly.
>>
>> --
>> Regards,
>>
>> Nik
>>
>>> -----Original Message-----
>>> From: Nik Martin [mailto:nik.martin@nfinausa.com]
>>> Sent: 2012-8-10 (ζ˜ŸζœŸδΊ”) 23:04
>>> To: cloudstack-users@incubator.apache.org
>>> Subject: Xen Host failure in pool
>>>
>>> We have a Xenserver 6.2 based pool of three hosts running under
>>> CloudStack Acton release (code base is about two weeks old).  We left
>>> last night and everything was fine, and I have about 2 VMs running on
>>> each host, not doing anything. This morning, I came in, and three VMs
>>> have stopped, and I logged into XenCenter to see what the pool looked
>>> like, and the Pol master hd changed from host HV3 to HV2, and HV1 was
>>> offline.  I logged in to HV1's console, and looked at the
>>> /var/log/messages, and it was complaining about the pool master
>> address
>>> being wrong. I went into CloudStack UI and deleted and re-added the
>>> host, and it failed immediately, and I got this in the log when I did:
>>>
>>>
>>> 2012-08-10 09:56:39,566 DEBUG [cloud.api.ApiServlet]
>>> (catalina-exec-24:null) Invalid paramemter in URL found. param:
>> hosttags=
>>> 2012-08-10 09:56:39,573 INFO  [cloud.resource.ResourceManagerImpl]
>>> (catalina-exec-24:null) Trying to add a new host at http://172.16.5.3
>> in
>>> data center 2
>>> 2012-08-10 09:56:39,629 DEBUG [xen.resource.XenServerConnectionPool]
>>> (catalina-exec-24:null) Slave logon to 172.16.5.3
>>> 2012-08-10 09:56:39,632 DEBUG [xen.resource.XenServerConnectionPool]
>>> (catalina-exec-24:null) Failed to slave local login to 172.16.5.3 due
>> to
>>> The master says the host is not known to it. Perhaps the Host was
>>> deleted from the master's database? Perhaps the slave is pointing to
>> the
>>> wrong master?
>>> 2012-08-10 09:56:39,638 DEBUG [xen.discoverer.XcpServerDiscoverer]
>>> (catalina-exec-24:null) other exceptions: java.lang.RuntimeException:
>>> can not get master ip
>>> java.lang.RuntimeException: can not get master ip
>>> 	at
>>>
>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool.getMasterIp(X
>> enServerConnectionPool.java:343)
>>> 	at
>>>
>> com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer.find(XcpServerD
>> iscoverer.java:179)
>>> 	at
>>>
>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage
>> rImpl.java:644)
>>> 	at
>>>
>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp
>> l.java:514)
>>> 	at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
>>> 	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
>>> 	at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
>>> 	at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
>>> 	at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
>>> 	at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>> 	at
>>>
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic
>> ationFilterChain.java:290)
>>> 	at
>>>
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil
>> terChain.java:206)
>>> 	at
>>>
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal
>> ve.java:233)
>>> 	at
>>>
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal
>> ve.java:191)
>>> 	at
>>>
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav
>> a:127)
>>> 	at
>>>
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav
>> a:102)
>>> 	at
>>>
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55
>> 5)
>>> 	at
>>>
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve
>> .java:109)
>>> 	at
>>>
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
>> 298)
>>> 	at
>>>
>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.
>> java:889)
>>> 	at
>>>
>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc
>> ess(Http11NioProtocol.java:721)
>>> 	at
>>>
>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.
>> java:2268)
>>> 	at
>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>> va:1110)
>>> 	at
>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>> ava:603)
>>> 	at java.lang.Thread.run(Thread.java:679)
>>> 2012-08-10 09:56:39,638 WARN  [cloud.resource.ResourceManagerImpl]
>>> (catalina-exec-24:null) Unable to find the server resources at
>>> http://172.16.5.3
>>> 2012-08-10 09:56:39,642 WARN  [api.commands.AddHostCmd]
>>> (catalina-exec-24:null) Exception:
>>> com.cloud.exception.DiscoveryException: Unable to add the host
>>> 	at
>>>
>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage
>> rImpl.java:694)
>>> 	at
>>>
>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp
>> l.java:514)
>>> 	at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
>>> 	at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
>>> 	at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
>>> 	at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
>>> 	at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
>>> 	at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>> 	at
>>>
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic
>> ationFilterChain.java:290)
>>> 	at
>>>
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil
>> terChain.java:206)
>>> 	at
>>>
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal
>> ve.java:233)
>>> 	at
>>>
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal
>> ve.java:191)
>>> 	at
>>>
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav
>> a:127)
>>> 	at
>>>
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav
>> a:102)
>>> 	at
>>>
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55
>> 5)
>>> 	at
>>>
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve
>> .java:109)
>>> 	at
>>>
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
>> 298)
>>> 	at
>>>
>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.
>> java:889)
>>> 	at
>>>
>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc
>> ess(Http11NioProtocol.java:721)
>>> 	at
>>>
>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.
>> java:2268)
>>> 	at
>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja
>> va:1110)
>>> 	at
>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j
>> ava:603)
>>> 	at java.lang.Thread.run(Thread.java:679)
>>> 2012-08-10 09:56:39,642 WARN  [cloud.api.ApiDispatcher]
>>> (catalina-exec-24:null) class com.cloud.api.ServerApiException :
>> Unable
>>> to add the host
>>> 2012-08-10 09:56:39,723 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-305:null) Ping from 17
>>> 2012-08-10 09:56:43,822 DEBUG
>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null)
>> Zone
>>> 2 is ready to launch secondary storage VM
>>> 2012-08-10 09:56:43,916 DEBUG
>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null)
>> Zone
>>> 2 is ready to launch console proxy
>>> 2012-08-10 09:56:44,102 DEBUG
>>> [network.router.VirtualNetworkApplianceManagerImpl]
>>> (RouterStatusMonitor-1:null) Found 2 routers.
>>> 2012-08-10 09:56:44,614 DEBUG [agent.manager.AgentManagerImpl]
>>> (AgentManager-Handler-12:null) Ping from 22
>>> 2012-08-10 09:56:48,864 DEBUG [agent.manager.AgentManagerImpl]
>>> (AgentManager-Handler-10:null) Ping from 18
>>> 2012-08-10 09:56:49,511 DEBUG [cloud.server.StatsCollector]
>>> (StatsCollector-1:null) VmStatsCollector is running...
>>> 2012-08-10 09:56:49,525 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-305:null) Seq 16-92408948: Executing request
>>> 2012-08-10 09:56:49,763 DEBUG [xen.resource.CitrixResourceBase]
>>> (DirectAgent-305:null) Vm cpu utilization 0.01
>>> 2012-08-10 09:56:49,763 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-305:null) Seq 16-92408948: Response Received:
>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>> (DirectAgent-305:null) Cleanup succeeded. Details null
>>> 2012-08-10 09:56:49,763 DEBUG [agent.transport.Request]
>>> (StatsCollector-1:null) Seq 16-92408948: Received:  { Ans: , MgmtId:
>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetVmStatsAnswer } }
>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>> (StatsCollector-1:null) Cleanup succeeded. Details null
>>> 2012-08-10 09:56:54,411 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-497:null) Ping from 17
>>> 2012-08-10 09:56:54,550 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-338:null) Ping from 16
>>> 2012-08-10 09:56:59,614 DEBUG [agent.manager.AgentManagerImpl]
>>> (AgentManager-Handler-8:null) Ping from 22
>>> 2012-08-10 09:57:03,864 DEBUG [agent.manager.AgentManagerImpl]
>>> (AgentManager-Handler-9:null) Ping from 18
>>> 2012-08-10 09:57:09,551 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-71:null) Ping from 16
>>> 2012-08-10 09:57:09,669 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-338:null) Ping from 17
>>> 2012-08-10 09:57:13,821 DEBUG
>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null)
>> Zone
>>> 2 is ready to launch secondary storage VM
>>> 2012-08-10 09:57:13,918 DEBUG
>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null)
>> Zone
>>> 2 is ready to launch console proxy
>>> 2012-08-10 09:57:14,102 DEBUG
>>> [network.router.VirtualNetworkApplianceManagerImpl]
>>> (RouterStatusMonitor-1:null) Found 2 routers.
>>> 2012-08-10 09:57:14,614 DEBUG [agent.manager.AgentManagerImpl]
>>> (AgentManager-Handler-11:null) Ping from 22
>>> 2012-08-10 09:57:15,645 DEBUG [cloud.server.StatsCollector]
>>> (StatsCollector-3:null) HostStatsCollector is running...
>>> 2012-08-10 09:57:15,656 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-71:null) Seq 16-92408949: Executing request
>>> 2012-08-10 09:57:15,878 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-71:null) Seq 16-92408949: Response Received:
>>> 2012-08-10 09:57:15,878 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>> (DirectAgent-71:null) Cleanup succeeded. Details null
>>> 2012-08-10 09:57:15,878 DEBUG [agent.transport.Request]
>>> (StatsCollector-3:null) Seq 16-92408949: Received:  { Ans: , MgmtId:
>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
>>> 2012-08-10 09:57:15,879 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>> (StatsCollector-3:null) Cleanup succeeded. Details null
>>> 2012-08-10 09:57:15,884 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-338:null) Seq 17-665190891: Executing request
>>> 2012-08-10 09:57:16,312 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-338:null) Seq 17-665190891: Response Received:
>>> 2012-08-10 09:57:16,312 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>> (DirectAgent-338:null) Cleanup succeeded. Details null
>>> 2012-08-10 09:57:16,312 DEBUG [agent.transport.Request]
>>> (StatsCollector-3:null) Seq 17-665190891: Received:  { Ans: , MgmtId:
>>> 130577622632, via: 17, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
>>> 2012-08-10 09:57:16,313 DEBUG [cloud.vm.VirtualMachineManagerImpl]
>>> (StatsCollector-3:null) Cleanup succeeded. Details null
>>> 2012-08-10 09:57:18,864 DEBUG [agent.manager.AgentManagerImpl]
>>> (AgentManager-Handler-15:null) Ping from 18
>>> 2012-08-10 09:57:24,407 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-71:null) Ping from 17
>>> 2012-08-10 09:57:24,566 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-338:null) Ping from 16
>>> 2012-08-10 09:57:29,615 DEBUG [agent.manager.AgentManagerImpl]
>>> (AgentManager-Handler-1:null) Ping from 22
>>> 2012-08-10 09:57:30,047 DECat /etc/xensource/network.confBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-294:null) Seq 16-92405762: Executing request
>>> 2012-08-10 09:57:30,308 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-294:null) Seq 16-92405762: Response Received:
>>> 2012-08-10 09:57:30,308 DEBUG [agent.transport.Request]
>>> (DirectAgent-294:null) Seq 16-92405762: Processing:  { Ans: , MgmtId:
>>> 130577622632, via: 16, Ver: v1, Flags: 10,
>>>
>> [{"ClusterSyncAnswer":{"_clusterId":1,"_newStates":{},"_isExecuted":fal
>> se,"result":true,"wait":0}}]
>>> }
>>> 2012-08-10 09:57:31,060 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-357:null) Seq 17-665190402: Executing request
>>> 2012-08-10 09:57:31,250 DEBUG [agent.manager.DirectAgentAttache]
>>> (DirectAgent-357:null) Seq 17-665190402: Response Received:
>>> 2012-08-10 09:57:31,250 DEBUG [agent.transport.Request]
>>> (DirectAgent-357:null) Seq 17-665190402: Processing:  { Ans: , MgmtId:
>>> 130577622632, via: 17, Ver: v1, Flags: 10,
>>> [{"Answer":{"result":true,"wait":0}}] }
>>>
>>> This is a very serious error, and I don't know how to fix it.  Can
>>> anyone suggest what might be the problem and hos I might fix it?
>>>
>>>
>>
>>
> 




Mime
View raw message