Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 27A26200BF1 for ; Tue, 20 Dec 2016 03:10:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 26424160B30; Tue, 20 Dec 2016 02:10:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1F13E160B21 for ; Tue, 20 Dec 2016 03:10:22 +0100 (CET) Received: (qmail 21736 invoked by uid 500); 20 Dec 2016 02:10:22 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 21720 invoked by uid 99); 20 Dec 2016 02:10:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2016 02:10:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C2529180255 for ; Tue, 20 Dec 2016 02:10:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.101 X-Spam-Level: X-Spam-Status: No, score=-0.101 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=pulasan.my Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id E9axThslcjvt for ; Tue, 20 Dec 2016 02:10:14 +0000 (UTC) Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com [209.85.218.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 186515F365 for ; Tue, 20 Dec 2016 02:10:14 +0000 (UTC) Received: by mail-oi0-f48.google.com with SMTP id b126so164627900oia.2 for ; Mon, 19 Dec 2016 18:10:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pulasan.my; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=RJPcjdQlPmJD+X4anTVNeh41jw4rkyRtGmwAtpZ7R0A=; b=H1EjQEKD/os1J7XJzwQv/qSW8MGJ8unX5GAnVJpasaUSa1rma1hOeyThjpNw7zkAPz 55MSEPRFD6W74MB+O9BTAzZmq+go0dQ4gI+oniLtqNyF++oDDa2B69aM3JLd8JdQoxWH G1tzyIB4cduZSHki7/j5UmdXaVJuFiLxF1oIA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=RJPcjdQlPmJD+X4anTVNeh41jw4rkyRtGmwAtpZ7R0A=; b=e2V19m+I2p0dgQL7i1LBWBq8Y5V/DGbhBEC8MOKAwkhVaxxJQqpInxjiyq0EVHpc2r qBvfN3OM/Gmb0anuloQLd7rBIDBn62LnST+rOQv9uEmVJNL7bww+oSkobCGynJBA2lU+ Q16++4mVtp3L+l6qlso+gWiMT1WflxA5jMLu8qW1jpSEU4onz6D/FpEK4Lr8iorWwyzl mVHQfoKXv8NPnVZWWY9Oof7VJB6mSQGkxjS1DpyQ0yJbbxvsccC4IywBXRZr4x2suHhg 8BIsnSt44yEeE9V7br0S6VRplC4F2DdGLgs+nQwCI2mJ2KBDvuSYFoAmdDf+CKoERAeX Vkrw== X-Gm-Message-State: AIkVDXKxZS1bXF+9rXPfqI5nseezy+tnAFnVdT3GoWUv7o8oBV3Y0szYheewM/DxFDQG790uu4VOlgRpW7oIAA== X-Received: by 10.157.27.136 with SMTP id z8mr11879312otd.92.1482199796962; Mon, 19 Dec 2016 18:09:56 -0800 (PST) MIME-Version: 1.0 Received: by 10.202.67.215 with HTTP; Mon, 19 Dec 2016 18:09:56 -0800 (PST) X-Originating-IP: [210.195.51.214] In-Reply-To: References: From: Syahrul Sazli Shaharir Date: Tue, 20 Dec 2016 10:09:56 +0800 Message-ID: Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks To: users@cloudstack.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Tue, 20 Dec 2016 02:10:24 -0000 On Mon, Dec 19, 2016 at 8:54 PM, Simon Weller wrote: > When you're in the console, can you ping the host ip? Yes - some (not all) of the IPs assigned on the host. > What are your ip tables rules on this host currently? Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:53 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:67 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:67 Chain FORWARD (policy ACCEPT) target prot opt source destination ACCEPT all -- 0.0.0.0/0 192.168.122.0/24 ctstate RELATED,ESTABLISHED ACCEPT all -- 192.168.122.0/24 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable Chain OUTPUT (policy ACCEPT) target prot opt source destination ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:68 > Can you dump the routing table as well? Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt If= ace 0.0.0.0 172.16.30.33 0.0.0.0 UG 0 0 0 cloudbr2.304 10.1.30.0 0.0.0.0 255.255.255.0 U 0 0 0 bo= nd1 10.2.30.0 0.0.0.0 255.255.255.0 U 0 0 0 cloudbr2.352 10.3.30.0 0.0.0.0 255.255.255.0 U 0 0 0 cloudbr2.353 172.16.30.32 0.0.0.0 255.255.255.224 U 0 0 0 cloudbr2.304 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 cl= oud0 192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 vi= rbr0 > Have you tried a restart of one of the working networks to see if it fail= s on restart? Yes, and able to restart OK. I logged on to each network's router VM console during the patchviasocket.py stage onwards, and find the following difference (both VMs were created and booted in the same host):- 1. Working network router VM ( http://pastebin.com/Y6zpDa6M ) :- ..... Dec 20 01:37:55 r-686-VM cloud: Boot up process done Dec 20 01:37:55 r-686-VM cloud: VR config: configuation format version 1.0 Dec 20 01:37:55 r-686-VM cloud: VR config: creating file: /var/cache/cloud/monitor_service.json Dec 20 01:37:55 r-686-VM cloud: VR config: create file success Dec 20 01:37:55 r-686-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py monitor_service.json Dec 20 01:38:16 r-686-VM cloud: VR config: execution success Dec 20 01:38:16 r-686-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:38:16 r-686-VM cloud: VR config: create file success Dec 20 01:38:16 r-686-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:38:38 r-686-VM cloud: VR config: execution success Dec 20 01:38:38 r-686-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:38:38 r-686-VM cloud: VR config: create file success Dec 20 01:38:38 r-686-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:39:01 r-686-VM cloud: VR config: execution success Dec 20 01:39:01 r-686-VM cloud: VR config: creating file: /var/cache/cloud/vm_metadata.json Dec 20 01:39:01 r-686-VM cloud: VR config: create file success Dec 20 01:39:01 r-686-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_metadata.json Dec 20 01:39:21 r-686-VM cloud: VR config: execution success Dec 20 01:39:21 r-686-VM cloud: VR config: creating file: /var/cache/cloud/vm_metadata.json Dec 20 01:39:21 r-686-VM cloud: VR config: create file success Dec 20 01:39:21 r-686-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_metadata.json Dec 20 01:39:41 r-686-VM cloud: VR config: execution success Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table Dec 20 01:39:41 r-686-VM cloud: VR config: Flushing conntrack table complet= ed Dec 20 01:39:42 r-686-VM cloud: VR config: configuation format version 1.0 Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table Dec 20 01:39:42 r-686-VM cloud: VR config: Flushing conntrack table complet= ed 2. Non-working network router VM ( http://pastebin.com/jzfGMGQB ):- ..... Dec 20 01:44:21 r-687-VM cloud: Boot up process done Dec 20 01:44:22 r-687-VM cloud: VR config: configuation format version 1.0 Dec 20 01:44:22 r-687-VM cloud: VR config: creating file: /var/cache/cloud/monitor_service.json Dec 20 01:44:22 r-687-VM cloud: VR config: create file success Dec 20 01:44:22 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py monitor_service.json Dec 20 01:44:42 r-687-VM cloud: VR config: execution success Dec 20 01:44:42 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:44:42 r-687-VM cloud: VR config: create file success Dec 20 01:44:42 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:45:05 r-687-VM cloud: VR config: execution success Dec 20 01:45:05 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:45:05 r-687-VM cloud: VR config: create file success Dec 20 01:45:05 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:45:27 r-687-VM cloud: VR config: execution success Dec 20 01:45:27 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:45:27 r-687-VM cloud: VR config: create file success Dec 20 01:45:27 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:45:49 r-687-VM cloud: VR config: execution success Dec 20 01:45:49 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:45:49 r-687-VM cloud: VR config: create file success Dec 20 01:45:49 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:46:12 r-687-VM cloud: VR config: execution success Dec 20 01:46:12 r-687-VM cloud: VR config: creating file: /var/cache/cloud/vm_dhcp_entry.json Dec 20 01:46:12 r-687-VM cloud: VR config: create file success Dec 20 01:46:12 r-687-VM cloud: VR config: executing: /opt/cloud/bin/update_config.py vm_dhcp_entry.json Dec 20 01:46:22 r-687-VM shutdown[3919]: shutting down for system halt Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): The system is going down for system halt NOW! Dec 20 01:46:22 r-687-VM shutdown[3962]: shutting down for system halt Broadcast message from root@r-687-VM (Tue Dec 20 01:46:22 2016): Power button pressed The system is going down for system halt NOW! Dec 20 01:46:23 r-687-VM KVP: KVP starting; pid is:4037 Dec 20 01:46:23 r-687-VM cloud: VR config: executing failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json debug1: channel 0: free: client-session, nchannels 1 Connection to 169.254.0.197 closed by remote host. Connection to 169.254.0.197 closed. Transferred: sent 4336, received 93744 bytes, in 180.3 seconds Bytes per second: sent 24.0, received 519.8 debug1: Exit status -1 Looks like the config script didn't get past vm_dhcp_entry.json ? Thanks. > > > > ________________________________ > From: Syahrul Sazli Shaharir > Sent: Monday, December 19, 2016 2:09 AM > To: users@cloudstack.apache.org > Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 net= works > > On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir > wrote: >> Hi Simon, >> >> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller wrote: >>> Can you turn on agent debug mode and take a look at the debug level log= s? >>> >>> >>> You can do that by running sed -i 's/INFO/DEBUG/g' /etc/cloudstack/agen= t/log4j-cloud.xml on the host and then restarting the agent. >> >> Here are the debug logs - patchviasocket.py executed OK but couldn't >> connect to the router VM's internal IP:- >> >> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing: >> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py >> -n r-669-VM -p %template=3DdomP%name=3Dr-669-VM%eth0ip=3D10.3.28.10%eth0= mask=3D255.255.255.0%gateway=3D10.3.28.1%domain=3Dnocser.net%cidrsize=3D24%= dhcprange=3D10.3.28.1%eth1ip=3D169.254.3.7%eth1mask=3D255.255.0.0%type=3Ddh= cpsrvr%disable_rp_filter=3Dtrue%dns1=3D8.8.8.8%dns2=3D8.8.4.4%ip6dns1=3D%ip= 6dns2=3D%baremetalnotificationsecuritykey=3DuavJByNGGjNLrELG-qbdN99__1I3tnp= 8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificati= onapikey=3D8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc= 7ZrCauiz23tp2-Wa3AASlg%host=3D10.2.30.11%port=3D8080 >> 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is >> successful. >> 2016-12-13 19:23:18,742 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to >> 169.254.3.7 >> 2016-12-13 19:23:21,749 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to >> 169.254.3.7 >> 2016-12-13 19:23:26,750 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to >> 169.254.3.7 >> 2016-12-13 19:23:29,757 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to >> 169.254.3.7 >> 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent] >> (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command: >> com.cloud.agent.api.GetHostStatsCommand >> 2016-12-13 19:23:34,759 DEBUG >> [resource.virtualnetwork.VirtualRoutingResource] >> (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to >> 169.254.3.7 >> >> virsh console also failed to show anything. > > Ok after upgrading to latest qemu-kvm-ev-2.6.0-27.1.el7, this time I > got to the console at some stage, but patchviasocket.py still times > out. Here are the console output:- > > http://pastebin.com/n37aHeSa > [http://pastebin.com/i/facebook.png] > > Router VM's short lifetime - Pastebin.com > pastebin.com > > > > > Thanks. > > >>> ________________________________ >>> From: Syahrul Sazli Shaharir >>> Sent: Monday, December 12, 2016 8:21 PM >>> To: users@cloudstack.apache.org >>> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 netwo= rks >>> >>> Hi, >>> >>> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph >>> environment. After running for some time, I faced with an issue with >>> one out of 4 networks - following a heartbeat-induced reset on all >>> hosts, the associated virtual router would not get recreated and >>> started properly on any of the 3 hosts I have, even after repeated >>> attempts of the following:- >>> - destroy-recreate cycles, via Cloudstack UI >>> - restartNetwork cleanup=3Dtrue API calls (failed with errorcode =3D 53= 0). >>> - redownload and reregister system VM template as another entry and >>> assign to router VM in global setting (boots the new template OK, but >>> still same problem) >>> - tweak default system offering for router VM (increased RAM from 256 t= o 512MB) >>> - created new system offering, with RAM tweak, and use of ceph rbd >>> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which >>> didnt work for some reason: it kept on using initial default offering >>> and created image on local host storage >>> - upgrade to latest cloudstack (previously was running 4.8) >>> >>> As with a handful of others in this list archives, virsh list and >>> dumpxml shows the VM created OK but failed soon after booting, as >>> found in the following error in agent.log :- >>> >>> 2016-12-13 10:03:33,894 WARN [kvm.resource.LibvirtComputingResource] >>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: >>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.p= y >>> -n r-668-VM -p %template=3DdomP%name=3Dr-668-VM%eth0ip=3D10.3.28.10%eth= 0mask=3D255.255.255.0%gateway=3D10.3.28.1%domain=3Dnocser.net%cidrsize=3D24= %dhcprange=3D10.3.28.1%eth1ip=3D169.254.0.33%eth1mask=3D255.255.0.0%type=3D= dhcpsrvr%disable_rp_filter=3Dtrue%dns1=3D8.8.8.8%dns2=3D8.8.4.4%ip6dns1=3D%= ip6dns2=3D%baremetalnotificationsecuritykey=3DuavJByNGGjNLrELG-qbdN99__1I3t= np8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotifica= tionapikey=3D8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qph= hc7ZrCauiz23tp2-Wa3AASlg%host=3D10.2.30.11%port=3D8080 >>> . Output is: >>> ..... >>> 2016-12-13 10:05:45,895 WARN [kvm.resource.LibvirtComputingResource] >>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out: >>> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh >>> vr_cfg.sh 169.254.0.33 -c >>> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg . Output >>> is: >>> >>> As mentioned, this only happens with 1 network (always the same >>> network). The other router VMs work OK. Any clues on how to >>> troubleshoot this further, would be greatly appreciated. >>> >>> Thanks. >>> >>> -- >>> --sazli --=20 --sazli