From dev-return-113507-archive-asf-public=cust-asf.ponee.io@cloudstack.apache.org Thu Jul 25 14:47:25 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 70BFD18066C for ; Thu, 25 Jul 2019 16:47:25 +0200 (CEST) Received: (qmail 81693 invoked by uid 500); 25 Jul 2019 14:47:24 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 81677 invoked by uid 99); 25 Jul 2019 14:47:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2019 14:47:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 11465C01D1 for ; Thu, 25 Jul 2019 14:47:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.801 X-Spam-Level: ** X-Spam-Status: No, score=2.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id NjQywDRuv65Y for ; Thu, 25 Jul 2019 14:47:20 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.51; helo=mail-ed1-f51.google.com; envelope-from=andrija.panic@gmail.com; receiver= Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 3879BBC7F2 for ; Thu, 25 Jul 2019 14:47:20 +0000 (UTC) Received: by mail-ed1-f51.google.com with SMTP id m10so50471699edv.6 for ; Thu, 25 Jul 2019 07:47:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=V5ieRrct+DKHRg5RlPOvQZgdQyC1q8DXhoWO0pT3JWE=; b=T6tb8YNBFvukw2d62RfnAUs5yUlJ/PVLORet6WsgC5K+Gb/alESXyyYAFPMY0CilpW c2Qs+hip7YnNXIFET9y9BNOHjTcYMad/pqsVMIhPzCnGxWuSwU2fwkots3m9vIsf/tKL zlppfs8eyzMoWj8jn9UQxRRtdScsklxYriEM0iJdUeeXwRh/Mqmw/sHrnF/5ZiNouql+ 8G/apZFqw92tC1Yr5uWsdJRuTue5G/4+vaEhlohGVpJDhP/yu4HuuSf2mvtFRvn0V7s2 X2gPZqkSLpnr6J3M0F5WwNNdW3EcsZTJ1KpXS8eMo+RlhHwbgvDvfwMOVKoBTyYeszOZ K71A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=V5ieRrct+DKHRg5RlPOvQZgdQyC1q8DXhoWO0pT3JWE=; b=AmKFvYdwEeTb3Hq/srZ/z7gDpwzUB5B8vpEenysEzSLS5FpiIkcnGoc5XRTG79EpVy bQI/avCd8+atUcXRw9vx1YjioR3LOJWf7Sc/EYPMj+3C0BCAMh6P4rCke0VPwW06f5QU BrBSpQob/M74R2vhjUdjpNPXzUAGzDjhQ/Kgkh+yr0dvc7CevtrzADCaeeyHVuEC5AKf nj7vfpAoycjA9dMAxzGG9dhdm1X2fewYJXv/WBez3WShSRezOuJYXEtEkG3USBxmf9b5 nKCDQfiEuQI6e/RhtzNKqY75f43kGMf57veDYNtHNR1aMPyvMhmDoi9x5wv8zB/Ivhdu vToA== X-Gm-Message-State: APjAAAVPvBuW8vhJB0u9wlEXaNR1jWN7Q8CpBUYXbSVHJtMbwpmuVx77 IGktnVM1nVBgqvR8bSUh1K/tVXrFOCWhAgRNpDYx1w== X-Google-Smtp-Source: APXvYqxNriIvnlkgURrAHC9x+D2sxres8fNypDxkm1abl3rmtJlA/rr+2iE/4H6TrxE8gH/LmbD9S1L9xkzdo8GH8GU= X-Received: by 2002:a05:6402:3d5:: with SMTP id t21mr77001159edw.13.1564066033062; Thu, 25 Jul 2019 07:47:13 -0700 (PDT) MIME-Version: 1.0 References: <612AFCF8-2663-4C09-AF44-A9CD80A3B2BD@gmail.com> <48B63A9F-7187-44AD-ADFA-FD3F95F51D7B@gmail.com> In-Reply-To: <48B63A9F-7187-44AD-ADFA-FD3F95F51D7B@gmail.com> From: Andrija Panic Date: Thu, 25 Jul 2019 16:46:56 +0200 Message-ID: Subject: Re: Secondary Storage VM timeout issue every hour To: dev Content-Type: multipart/alternative; boundary="0000000000004e581c058e8280be" --0000000000004e581c058e8280be Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable since basic network connectivity (ping failures) was down between mgmts and nodes (and SSVM on it) - I would point my finger to your networking equipment - i.e. I expect zero problems with ACS (since pings fail). Let us know how it goes. Andrija On Thu, 25 Jul 2019 at 16:04, Rakesh v wrote: > Yes I was monitoring it continuously. Below are the steps which I was > doing when issue happened > > > 1. Ping from MGT server to ssvm > 2. Ping from ssvm to secondary storage ip > 3. Ping from ssvm to public IP like 8.8.8.8 > 4. Ping from MGT server to node in which ssvm was running > > > Out of all these, the ping drops were observed from MGT server to ssvm an= d > mgt server to nodes. Basically all nodes lost connection. Then it recover= ed > itself after 1 minute. > > > Sent from my iPhone > > > On 25-Jul-2019, at 3:48 PM, Andrija Panic > wrote: > > > > Can you observe the status of SSVM (is it > UP/Connecting/Disconnected/Down) > > while you have issues? > > > > I would advise checking your Secondary Storage itself - and also runnin= g > > the SSVM diagnose script /usr/local/cloud/systemvm/ssvm-check.sh - > observe > > if any errors with NFS or others. > > > > Lastly - and don't laugh - check that you don't have issues with > networking > > equipment (some of us had VEEEERY strange issues in connectivity some > years > > ago with crappy QCT/Quanta Switches in MLAG setup) > > > > Andrija > > > >> On Thu, 25 Jul 2019 at 15:42, Rakesh v > wrote: > >> > >> Yes I have set the ip's of the three MGT servers in the "host" field > >> > >> Sent from my iPhone > >> > >>> On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion > wrote: > >>> > >>> Do you have a load balancer in front of cloudstack? Did you set the > >> global > >>> settings "host" to the ip of the mgmt server? > >>> > >>> > >>> Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh < > >> www.rakeshv.com@gmail.com> > >>> a =C3=A9crit : > >>> > >>>> Hello People > >>>> > >>>> > >>>> I have a strange issue where mgt server times out to send a command = to > >>>> secondary storage VM every hour and because of this UI won't be > >> accessible > >>>> for a short duration of time. Sometimes I have to restart mgt server > to > >> get > >>>> it back to working state and sometimes I don't need to restart it. I > >> also > >>>> see some exceptions while fetching the storage stats. > >>>> > >>>> > >>>> The log says secondary storage VM is lagging behind mgt server in pi= ng > >> and > >>>> it sends a disconnect message to other components. Can you let me kn= ow > >> how > >>>> to troubleshoot this issue? I destroyed the secondary storage VM but > the > >>>> issue still persists. I checked the date/time on the mgt server and > SSVM > >>>> and they are same. This is happening for quite a few days now. Below > are > >>>> the logs > >>>> > >>>> > >>>> > >>>> 2019-07-25 04:01:22,769 INFO [c.c.a.m.AgentManagerImpl] > >>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following > >> agents > >>>> behind on ping: [183] > >>>> 2019-07-25 04:01:22,775 WARN [c.c.a.m.AgentManagerImpl] > >>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for > >>>> CPVM/SSVM due to physical connection close. host: 183 > >>>> 2019-07-25 04:01:22,778 INFO [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is > >> disconnecting > >>>> with event ShutdownRequested > >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of > agent > >>>> 183is Disconnected, current status is Up > >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link f= or > >> 183 > >>>> with state Disconnected > >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183 > >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing Disconnec= t. > >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq > >>>> 183-7541559051008607242: Sending disconnect to class > >>>> com.cloud.agent.manager.SynchronousListener > >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: > com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer > >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection] > >>>> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=3D/ > >> 172.30.32.16 > >>>> ,port=3D38250,localport=3D8250] > >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache] > >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >>>> 183-7541559051008607242: Waiting some more time because this is the > >> current > >>>> command > >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: > com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer > >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache] > >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >>>> 183-7541559051008607242: Waiting some more time because this is the > >> current > >>>> command > >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.deploy.DeploymentPlanningManagerImpl > >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.network.security.SecurityGroupListener > >>>> 2019-07-25 04:01:22,783 INFO [c.c.u.e.CSExceptionErrorCode] > >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find > >> exception: > >>>> com.cloud.exception.OperationTimedoutException in error code list fo= r > >>>> exceptions > >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: > org.apache.cloudstack.engine.orchestration.NetworkOrchestrator > >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl > >>>> 2019-07-25 04:01:22,783 WARN [c.c.a.m.AgentAttache] > >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >>>> 183-7541559051008607242: Timed out on null > >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.storage.listener.StoragePoolMonitor > >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache] > >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >>>> 183-7541559051008607242: Cancelling. > >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.storage.secondary.SecondaryStorageListener > >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.network.SshKeysDistriMonitor > >>>> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint] > >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send > command, > >>>> due to Agent:183, com.cloud.exception.OperationTimedoutException: > >> Commands > >>>> 7541559051008607242 to Host 183 timed out after 3600 > >>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: > com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl > >>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.storage.download.DownloadListener > >>>> > >>>> > >>>> > >>>> > >>>> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector] > >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to > >> retrieve > >>>> storage stats > >>>> com.cloud.utils.exception.CloudRuntimeException: Failed to send > command, > >>>> due to Agent:183, com.cloud.exception.OperationTimedoutException: > >> Commands > >>>> 7541559051008607242 to Host 183 timed out after 3600 > >>>> at > >>>> > >>>> > >> > org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEn= dPoint.java:133) > >>>> at > >>>> > >>>> > >> > com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollec= tor.java:1139) > >>>> at > >>>> > >>>> > >> > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(Manage= dContextRunnable.java:49) > >>>> at > >>>> > >>>> > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(D= efaultManagedContext.java:56) > >>>> at > >>>> > >>>> > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWith= Context(DefaultManagedContext.java:103) > >>>> at > >>>> > >>>> > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithC= ontext(DefaultManagedContext.java:53) > >>>> at > >>>> > >>>> > >> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedC= ontextRunnable.java:46) > >>>> at > >>>> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > >>>> at > >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > >>>> at > >>>> > >>>> > >> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acce= ss$301(ScheduledThreadPoolExecutor.java:180) > >>>> at > >>>> > >>>> > >> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(= ScheduledThreadPoolExecutor.java:294) > >>>> at > >>>> > >>>> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1149) > >>>> at > >>>> > >>>> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:624) > >>>> at java.lang.Thread.run(Thread.java:748) > >>>> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.consoleproxy.ConsoleProxyListener > >>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.storage.LocalStoragePoolListener > >>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.storage.upload.UploadListener > >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.capacity.StorageCapacityListener > >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.capacity.ComputeCapacityListener > >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.network.SshKeysDistriMonitor > >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: com.cloud.network.router.VirtualNetworkApplianceManagerImp= l > >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: > >>>> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener > >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called = on > >> 183 > >>>> with status Disconnected > >>>> > >>>> > >>>> > >>>> > >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: > com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener > >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect t= o > >>>> listener: > com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener > >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status] > >> (AgentTaskPool-1:ctx-66de2057) > >>>> (logid:841d2a63) Transition:[Resource state =3D Enabled, Agent event= =3D > >>>> ShutdownRequested, Host id =3D 183, name =3D s-2775-VM] > >>>> > >>>> > >>>> > >>>> -- > >>>> Thanks and regards > >>>> Rakesh venkatesh > >>>> > >> > > > > > > -- > > > > Andrija Pani=C4=87 > --=20 Andrija Pani=C4=87 --0000000000004e581c058e8280be--