cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Weller <swel...@ena.com>
Subject Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
Date Mon, 19 Dec 2016 12:54:36 GMT
When you're in the console, can you ping the host ip?

What are your ip tables rules on this host currently?

Can you dump the routing table as well?


Have you tried a restart of one of the working networks to see if it fails on restart?



________________________________
From: Syahrul Sazli Shaharir <sazli@pulasan.my>
Sent: Monday, December 19, 2016 2:09 AM
To: users@cloudstack.apache.org
Subject: Re: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks

On Tue, Dec 13, 2016 at 7:26 PM, Syahrul Sazli Shaharir
<sazli@pulasan.my> wrote:
> Hi Simon,
>
> On Tue, Dec 13, 2016 at 10:31 AM, Simon Weller <sweller@ena.com> wrote:
>> Can you turn on agent debug mode and take a look at the debug level logs?
>>
>>
>> You can do that by running sed -i 's/INFO/DEBUG/g' /etc/cloudstack/agent/log4j-cloud.xml
on the host and then restarting the agent.
>
> Here are the debug logs - patchviasocket.py executed OK but couldn't
> connect to the router VM's internal IP:-
>
> 2016-12-13 19:23:18,627 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Executing:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
> -n r-669-VM -p %template=domP%name=r-669-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.3.7%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
> 2016-12-13 19:23:18,739 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Execution is
> successful.
> 2016-12-13 19:23:18,742 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> 169.254.3.7
> 2016-12-13 19:23:21,749 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> 169.254.3.7
> 2016-12-13 19:23:26,750 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Trying to connect to
> 169.254.3.7
> 2016-12-13 19:23:29,757 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Could not connect to
> 169.254.3.7
> 2016-12-13 19:23:29,869 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-5:null) (logid:981a5f6f) Processing command:
> com.cloud.agent.api.GetHostStatsCommand
> 2016-12-13 19:23:34,759 DEBUG
> [resource.virtualnetwork.VirtualRoutingResource]
> (agentRequest-Handler-4:null) (logid:0bf9a356) Unable to logon to
> 169.254.3.7
>
> virsh console also failed to show anything.

Ok after upgrading to latest qemu-kvm-ev-2.6.0-27.1.el7, this time I
got to the console at some stage, but patchviasocket.py still times
out. Here are the console output:-

http://pastebin.com/n37aHeSa
[http://pastebin.com/i/facebook.png]<http://pastebin.com/n37aHeSa>

Router VM's short lifetime - Pastebin.com<http://pastebin.com/n37aHeSa>
pastebin.com




Thanks.


>> ________________________________
>> From: Syahrul Sazli Shaharir <sazli@pulasan.my>
>> Sent: Monday, December 12, 2016 8:21 PM
>> To: users@cloudstack.apache.org
>> Subject: Router VM: patchviasocket.py timeout issue on 1 out of 4 networks
>>
>> Hi,
>>
>> I am running latest Cloudstack 4.9.0.1 on CentOS 7 KVM + ceph
>> environment. After running for some time, I faced with an issue with
>> one out of 4 networks - following a heartbeat-induced reset on all
>> hosts, the associated virtual router would not get recreated and
>> started properly on any of the 3 hosts I have, even after repeated
>> attempts of the following:-
>> - destroy-recreate cycles, via Cloudstack UI
>> - restartNetwork cleanup=true API calls (failed with errorcode = 530).
>> - redownload and reregister system VM template as another entry and
>> assign to router VM in global setting (boots the new template OK, but
>> still same problem)
>> - tweak default system offering for router VM (increased RAM from 256 to 512MB)
>> - created new system offering, with RAM tweak, and use of ceph rbd
>> store, and assigned it to Cloud.Com-SoftwareRouter as per docs - which
>> didnt work for some reason: it kept on using initial default offering
>> and created image on local host storage
>> - upgrade to latest cloudstack (previously was running 4.8)
>>
>> As with a handful of others in this list archives, virsh list and
>> dumpxml shows the VM created OK but failed soon after booting, as
>> found in the following error in agent.log :-
>>
>> 2016-12-13 10:03:33,894 WARN  [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.py
>> -n r-668-VM -p %template=domP%name=r-668-VM%eth0ip=10.3.28.10%eth0mask=255.255.255.0%gateway=10.3.28.1%domain=nocser.net%cidrsize=24%dhcprange=10.3.28.1%eth1ip=169.254.0.33%eth1mask=255.255.0.0%type=dhcpsrvr%disable_rp_filter=true%dns1=8.8.8.8%dns2=8.8.4.4%ip6dns1=%ip6dns2=%baremetalnotificationsecuritykey=uavJByNGGjNLrELG-qbdN99__1I3tnp8qa0KbcsKokKJcPB43K9s6oQu2nMLqo3YP8p6jqDy5XT3WWOWBA2yNw%baremetalnotificationapikey=8JH4mdkxsEMhgIBgMonkNXAEKjVOeZnG1m5UVekvvo4v_iXQ4ZS7rh6NNS0qphhc7ZrCauiz23tp2-Wa3AASlg%host=10.2.30.11%port=8080
>> .  Output is:
>> .....
>> 2016-12-13 10:05:45,895 WARN  [kvm.resource.LibvirtComputingResource]
>> (agentRequest-Handler-1:null) (logid:633e6e03) Timed out:
>> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh
>> vr_cfg.sh 169.254.0.33 -c
>> /var/cache/cloud/VR-48ea8a95-6c02-499f-88d3-eae5bf9f9fbe.cfg .  Output
>> is:
>>
>> As mentioned, this only happens with 1 network (always the same
>> network). The other router VMs work OK. Any clues on how to
>> troubleshoot this further, would be greatly appreciated.
>>
>> Thanks.
>>
>> --
>> --sazli

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message