cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrija Panic <andrija.pa...@gmail.com>
Subject Re: Secondary Storage VM timeout issue every hour
Date Thu, 25 Jul 2019 14:46:56 GMT
since basic network connectivity (ping failures) was down between mgmts and
nodes (and SSVM on it)  - I would point my finger to your networking
equipment - i.e. I expect zero problems with ACS (since pings fail).

Let us know how it goes.

Andrija

On Thu, 25 Jul 2019 at 16:04, Rakesh v <www.rakeshv.com@gmail.com> wrote:

> Yes I was monitoring it continuously. Below are the steps which I was
> doing when issue happened
>
>
> 1. Ping from MGT server to ssvm
> 2. Ping from ssvm to secondary storage ip
> 3. Ping from ssvm to public IP like 8.8.8.8
> 4. Ping from MGT server to node in which ssvm was running
>
>
> Out of all these, the ping drops were observed from MGT server to ssvm and
> mgt server to nodes. Basically all nodes lost connection. Then it recovered
> itself after 1 minute.
>
>
> Sent from my iPhone
>
> > On 25-Jul-2019, at 3:48 PM, Andrija Panic <andrija.panic@gmail.com>
> wrote:
> >
> > Can you observe the status of SSVM (is it
> UP/Connecting/Disconnected/Down)
> > while you have issues?
> >
> > I would advise checking your Secondary Storage itself - and also running
> > the SSVM diagnose script  /usr/local/cloud/systemvm/ssvm-check.sh -
> observe
> > if any errors with NFS or others.
> >
> > Lastly - and don't laugh - check that you don't have issues with
> networking
> > equipment (some of us had VEEEERY strange issues in connectivity some
> years
> > ago with crappy QCT/Quanta Switches in MLAG setup)
> >
> > Andrija
> >
> >> On Thu, 25 Jul 2019 at 15:42, Rakesh v <www.rakeshv.com@gmail.com>
> wrote:
> >>
> >> Yes I have set the ip's of the three MGT servers in the "host" field
> >>
> >> Sent from my iPhone
> >>
> >>> On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion <pdion891@apache.org>
> wrote:
> >>>
> >>> Do you have a load balancer in front of cloudstack? Did you set the
> >> global
> >>> settings "host" to the ip of the mgmt server?
> >>>
> >>>
> >>> Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh <
> >> www.rakeshv.com@gmail.com>
> >>> a écrit :
> >>>
> >>>> Hello People
> >>>>
> >>>>
> >>>> I have a strange issue where mgt server times out to send a command
to
> >>>> secondary storage VM every hour and because of this UI won't be
> >> accessible
> >>>> for a short duration of time. Sometimes I have to restart mgt server
> to
> >> get
> >>>> it back to working state and sometimes I don't need to restart it. I
> >> also
> >>>> see some exceptions while fetching the storage stats.
> >>>>
> >>>>
> >>>> The log says secondary storage VM is lagging behind mgt server in ping
> >> and
> >>>> it sends a disconnect message to other components. Can you let me know
> >> how
> >>>> to troubleshoot this issue? I destroyed the secondary storage VM but
> the
> >>>> issue still persists. I checked the date/time on the mgt server and
> SSVM
> >>>> and they are same. This is happening for quite a few days now. Below
> are
> >>>> the logs
> >>>>
> >>>>
> >>>>
> >>>> 2019-07-25 04:01:22,769 INFO  [c.c.a.m.AgentManagerImpl]
> >>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following
> >> agents
> >>>> behind on ping: [183]
> >>>> 2019-07-25 04:01:22,775 WARN  [c.c.a.m.AgentManagerImpl]
> >>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for
> >>>> CPVM/SSVM due to physical connection close. host: 183
> >>>> 2019-07-25 04:01:22,778 INFO  [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is
> >> disconnecting
> >>>> with event ShutdownRequested
> >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of
> agent
> >>>> 183is Disconnected, current status is Up
> >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link for
> >> 183
> >>>> with state Disconnected
> >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183
> >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing Disconnect.
> >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq
> >>>> 183-7541559051008607242: Sending disconnect to class
> >>>> com.cloud.agent.manager.SynchronousListener
> >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener:
> com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
> >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection]
> >>>> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=/
> >> 172.30.32.16
> >>>> ,port=38250,localport=8250]
> >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
> >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>> 183-7541559051008607242: Waiting some more time because this is the
> >> current
> >>>> command
> >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener:
> com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
> >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache]
> >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>> 183-7541559051008607242: Waiting some more time because this is the
> >> current
> >>>> command
> >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.deploy.DeploymentPlanningManagerImpl
> >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.network.security.SecurityGroupListener
> >>>> 2019-07-25 04:01:22,783 INFO  [c.c.u.e.CSExceptionErrorCode]
> >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find
> >> exception:
> >>>> com.cloud.exception.OperationTimedoutException in error code list for
> >>>> exceptions
> >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener:
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
> >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
> >>>> 2019-07-25 04:01:22,783 WARN  [c.c.a.m.AgentAttache]
> >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>> 183-7541559051008607242: Timed out on null
> >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.storage.listener.StoragePoolMonitor
> >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache]
> >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>> 183-7541559051008607242: Cancelling.
> >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.storage.secondary.SecondaryStorageListener
> >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.network.SshKeysDistriMonitor
> >>>> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint]
> >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send
> command,
> >>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
> >> Commands
> >>>> 7541559051008607242 to Host 183 timed out after 3600
> >>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener:
> com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
> >>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.storage.download.DownloadListener
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector]
> >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to
> >> retrieve
> >>>> storage stats
> >>>> com.cloud.utils.exception.CloudRuntimeException: Failed to send
> command,
> >>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
> >> Commands
> >>>> 7541559051008607242 to Host 183 timed out after 3600
> >>>>       at
> >>>>
> >>>>
> >>
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
> >>>>       at
> >>>>
> >>>>
> >>
> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:1139)
> >>>>       at
> >>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> >>>>       at
> >>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> >>>>       at
> >>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> >>>>       at
> >>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> >>>>       at
> >>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> >>>>       at
> >>>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >>>>       at
> >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> >>>>       at
> >>>>
> >>>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >>>>       at
> >>>>
> >>>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >>>>       at
> >>>>
> >>>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>>>       at
> >>>>
> >>>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>>>       at java.lang.Thread.run(Thread.java:748)
> >>>> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.consoleproxy.ConsoleProxyListener
> >>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.storage.LocalStoragePoolListener
> >>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.storage.upload.UploadListener
> >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.capacity.StorageCapacityListener
> >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.capacity.ComputeCapacityListener
> >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.network.SshKeysDistriMonitor
> >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
> >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener:
> >>>> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
> >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called
on
> >> 183
> >>>> with status Disconnected
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener:
> com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
> >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
> >>>> listener:
> com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
> >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status]
> >> (AgentTaskPool-1:ctx-66de2057)
> >>>> (logid:841d2a63) Transition:[Resource state = Enabled, Agent event =
> >>>> ShutdownRequested, Host id = 183, name = s-2775-VM]
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Thanks and regards
> >>>> Rakesh venkatesh
> >>>>
> >>
> >
> >
> > --
> >
> > Andrija Panić
>


-- 

Andrija Panić

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message