From dev-return-113506-archive-asf-public=cust-asf.ponee.io@cloudstack.apache.org Thu Jul 25 14:04:50 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id EE5ED18066C for ; Thu, 25 Jul 2019 16:04:49 +0200 (CEST) Received: (qmail 91330 invoked by uid 500); 25 Jul 2019 14:04:48 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 91313 invoked by uid 99); 25 Jul 2019 14:04:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jul 2019 14:04:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id CAE33C08A8 for ; Thu, 25 Jul 2019 14:04:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.199 X-Spam-Level: X-Spam-Status: No, score=-0.199 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Ugvro9c7DGjP for ; Thu, 25 Jul 2019 14:04:45 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::531; helo=mail-ed1-x531.google.com; envelope-from=www.rakeshv.com@gmail.com; receiver= Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id C3DD67DC04 for ; Thu, 25 Jul 2019 14:04:44 +0000 (UTC) Received: by mail-ed1-x531.google.com with SMTP id k21so50378620edq.3 for ; Thu, 25 Jul 2019 07:04:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:content-transfer-encoding:mime-version:date:subject:message-id :references:in-reply-to:to; bh=38znvwTdbgcmfO9Y2HNxEXTzWyje8v8MPlL6fXXMRrY=; b=Rej95yHpmfWoyzmQ/Y32MAuf3CGx6IUAviZGgm7+S/HOuvapuEOhig3w209PHRHGRc f43vnwyFszl0zsT/Ln60EwtUpslyuADYlMPg5kc/z7BiXjjIgemT+57Mcsdl9Euxio9R CRbb4fhsn78fyfk8m2THKE5mbVNwGPk2R3LS0ZFCUpURYoYjBR/ayOQjK6isc85caIHW OC69HnZb1OXX4szmxJAPs136KMzne0DqY2dc+Z53LUSQjKbrjV92trqDpI/Rh6/TN2jy aWPZXwcPH8Hmbt6q+hu6kGOqpnrFDEzUHJNEDVcY4heqv+U81XNtAzl1MZh0lU/I+46f taOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version:date :subject:message-id:references:in-reply-to:to; bh=38znvwTdbgcmfO9Y2HNxEXTzWyje8v8MPlL6fXXMRrY=; b=eBgOJOlYPxHL1sV78fXev+DJKpdrH8/wV4t3+UcwIgNURmwFoqE5BFjqa3wi0ZbI+s olgc/3s+AF7m4BbzE7NQAeCV5oDyvpjGaembj9mBvTQk4rHztPv759yGCXYW2gAXmVps qba3wTfSetm6eBsyazWEzQW77XkmDBJD2e+b4Y/jRyIo7/Uq3KPL7g8TVbMXCcoKGI+x Kw0tEHxjxMUdPjj4edpEV33MraLNAT0sK7C2b1SKgFFXJfjjr22o1oGJRUN0LR00swkQ Tw2J11Yv8cqSH7gHY96dnQsGR9MwWZrtM0mFSeUDCKVrqo0GRMV+8JrfK9Npb0qzVqDX 5HjA== X-Gm-Message-State: APjAAAXzcR4WEOSVA/+tooaxxz4ENkp1iemBzYt0FN/wjflX43QLQQkz clbMhAcF4ofMzqXb82xE8rLsSpqqAs0= X-Google-Smtp-Source: APXvYqxU1KiuK5s+IEAv9abKLPyKo2AKlcilzkUubLMR5AyEENZriyXehS3Ce2bOrvJzd6kZASwWQw== X-Received: by 2002:a17:906:a952:: with SMTP id hh18mr68512386ejb.289.1564063483913; Thu, 25 Jul 2019 07:04:43 -0700 (PDT) Received: from [192.168.105.66] ([87.255.57.201]) by smtp.gmail.com with ESMTPSA id a6sm12908424eds.19.2019.07.25.07.04.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2019 07:04:43 -0700 (PDT) From: Rakesh v Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Date: Thu, 25 Jul 2019 16:04:42 +0200 Subject: Re: Secondary Storage VM timeout issue every hour Message-Id: <48B63A9F-7187-44AD-ADFA-FD3F95F51D7B@gmail.com> References: <612AFCF8-2663-4C09-AF44-A9CD80A3B2BD@gmail.com> In-Reply-To: To: dev@cloudstack.apache.org X-Mailer: iPhone Mail (16F203) Yes I was monitoring it continuously. Below are the steps which I was doing w= hen issue happened 1. Ping from MGT server to ssvm 2. Ping from ssvm to secondary storage ip 3. Ping from ssvm to public IP like 8.8.8.8 4. Ping from MGT server to node in which ssvm was running Out of all these, the ping drops were observed from MGT server to ssvm and m= gt server to nodes. Basically all nodes lost connection. Then it recovered i= tself after 1 minute. Sent from my iPhone > On 25-Jul-2019, at 3:48 PM, Andrija Panic wrote:= >=20 > Can you observe the status of SSVM (is it UP/Connecting/Disconnected/Down)= > while you have issues? >=20 > I would advise checking your Secondary Storage itself - and also running > the SSVM diagnose script /usr/local/cloud/systemvm/ssvm-check.sh - observ= e > if any errors with NFS or others. >=20 > Lastly - and don't laugh - check that you don't have issues with networkin= g > equipment (some of us had VEEEERY strange issues in connectivity some year= s > ago with crappy QCT/Quanta Switches in MLAG setup) >=20 > Andrija >=20 >> On Thu, 25 Jul 2019 at 15:42, Rakesh v wrote:= >>=20 >> Yes I have set the ip's of the three MGT servers in the "host" field >>=20 >> Sent from my iPhone >>=20 >>> On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion wrote:= >>>=20 >>> Do you have a load balancer in front of cloudstack? Did you set the >> global >>> settings "host" to the ip of the mgmt server? >>>=20 >>>=20 >>> Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh < >> www.rakeshv.com@gmail.com> >>> a =C3=A9crit : >>>=20 >>>> Hello People >>>>=20 >>>>=20 >>>> I have a strange issue where mgt server times out to send a command to >>>> secondary storage VM every hour and because of this UI won't be >> accessible >>>> for a short duration of time. Sometimes I have to restart mgt server to= >> get >>>> it back to working state and sometimes I don't need to restart it. I >> also >>>> see some exceptions while fetching the storage stats. >>>>=20 >>>>=20 >>>> The log says secondary storage VM is lagging behind mgt server in ping >> and >>>> it sends a disconnect message to other components. Can you let me know >> how >>>> to troubleshoot this issue? I destroyed the secondary storage VM but th= e >>>> issue still persists. I checked the date/time on the mgt server and SSV= M >>>> and they are same. This is happening for quite a few days now. Below ar= e >>>> the logs >>>>=20 >>>>=20 >>>>=20 >>>> 2019-07-25 04:01:22,769 INFO [c.c.a.m.AgentManagerImpl] >>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following >> agents >>>> behind on ping: [183] >>>> 2019-07-25 04:01:22,775 WARN [c.c.a.m.AgentManagerImpl] >>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for >>>> CPVM/SSVM due to physical connection close. host: 183 >>>> 2019-07-25 04:01:22,778 INFO [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is >> disconnecting >>>> with event ShutdownRequested >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of agen= t >>>> 183is Disconnected, current status is Up >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link for >> 183 >>>> with state Disconnected >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183 >>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing Disconnect. >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq >>>> 183-7541559051008607242: Sending disconnect to class >>>> com.cloud.agent.manager.SynchronousListener >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer= >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection] >>>> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=3D/ >> 172.30.32.16 >>>> ,port=3D38250,localport=3D8250] >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache] >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq >>>> 183-7541559051008607242: Waiting some more time because this is the >> current >>>> command >>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer= >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache] >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq >>>> 183-7541559051008607242: Waiting some more time because this is the >> current >>>> command >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.deploy.DeploymentPlanningManagerImpl >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.network.security.SecurityGroupListener >>>> 2019-07-25 04:01:22,783 INFO [c.c.u.e.CSExceptionErrorCode] >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find >> exception: >>>> com.cloud.exception.OperationTimedoutException in error code list for >>>> exceptions >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrato= r >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl >>>> 2019-07-25 04:01:22,783 WARN [c.c.a.m.AgentAttache] >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq >>>> 183-7541559051008607242: Timed out on null >>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.storage.listener.StoragePoolMonitor >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache] >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq >>>> 183-7541559051008607242: Cancelling. >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.storage.secondary.SecondaryStorageListener >>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.network.SshKeysDistriMonitor >>>> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint] >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send command= , >>>> due to Agent:183, com.cloud.exception.OperationTimedoutException: >> Commands >>>> 7541559051008607242 to Host 183 timed out after 3600 >>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImp= l >>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.storage.download.DownloadListener >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector] >>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to >> retrieve >>>> storage stats >>>> com.cloud.utils.exception.CloudRuntimeException: Failed to send command= , >>>> due to Agent:183, com.cloud.exception.OperationTimedoutException: >> Commands >>>> 7541559051008607242 to Host 183 timed out after 3600 >>>> at >>>>=20 >>>>=20 >> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEn= dPoint.java:133) >>>> at >>>>=20 >>>>=20 >> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollec= tor.java:1139) >>>> at >>>>=20 >>>>=20 >> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(Manage= dContextRunnable.java:49) >>>> at >>>>=20 >>>>=20 >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(D= efaultManagedContext.java:56) >>>> at >>>>=20 >>>>=20 >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWith= Context(DefaultManagedContext.java:103) >>>> at >>>>=20 >>>>=20 >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithC= ontext(DefaultManagedContext.java:53) >>>> at >>>>=20 >>>>=20 >> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedC= ontextRunnable.java:46) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)= >>>> at >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) >>>> at >>>>=20 >>>>=20 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acce= ss$301(ScheduledThreadPoolExecutor.java:180) >>>> at >>>>=20 >>>>=20 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(= ScheduledThreadPoolExecutor.java:294) >>>> at >>>>=20 >>>>=20 >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1149) >>>> at >>>>=20 >>>>=20 >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:624) >>>> at java.lang.Thread.run(Thread.java:748) >>>> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.consoleproxy.ConsoleProxyListener >>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.storage.LocalStoragePoolListener >>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.storage.upload.UploadListener >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.capacity.StorageCapacityListener >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.capacity.ComputeCapacityListener >>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.network.SshKeysDistriMonitor >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: >>>> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called on >> 183 >>>> with status Disconnected >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener= >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] >>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to >>>> listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListene= r >>>> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status] >> (AgentTaskPool-1:ctx-66de2057) >>>> (logid:841d2a63) Transition:[Resource state =3D Enabled, Agent event =3D= >>>> ShutdownRequested, Host id =3D 183, name =3D s-2775-VM] >>>>=20 >>>>=20 >>>>=20 >>>> -- >>>> Thanks and regards >>>> Rakesh venkatesh >>>>=20 >>=20 >=20 >=20 > --=20 >=20 > Andrija Pani=C4=87