cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Mikhailovsky <and...@arhont.com>
Subject Re: KVM - No longer able to start virtual routers
Date Wed, 21 Oct 2015 22:09:29 GMT


Hi Simon, 

Chatting with the devs at ceph we've managed to isolate the problem. It seems that the changes
introduced in 0.94.4 version of ceph broke the ability to start vms which were created from
a template by cloning. This affects virtual routers, but I am not sure why it didn't make
any effect on the cpvm and ssvms. Anyway, they've suggested that I downgrade ceph to 0.94.3
on the client host servers, but not osd/mon servers. I've done this and ACS is back in a happy
mood and all is working again. 

Thanks for your help guys, 

Andrei 

----- Original Message -----

From: "Simon Weller" <sweller@ena.com> 
To: users@cloudstack.apache.org 
Sent: Wednesday, 21 October, 2015 3:50:32 PM 
Subject: Re: KVM - No longer able to start virtual routers 

Andrei, 

Reading between the lines here...but based on the logs it looks to be related to locking.


Have you tried running rdb lock list <block device> command on the block device that
libvirt is trying to use? 
You may need to clear the lock using rdb lock remove <block device>. 

- Si 

________________________________________ 
From: Andrei Mikhailovsky <andrei@arhont.com> 
Sent: Wednesday, October 21, 2015 6:55 AM 
To: users@cloudstack.apache.org 
Subject: Re: KVM - No longer able to start virtual routers 

Right, a bit more digging around and the issue seem to relate to the ceph storage. Here is
the log from libvirt: 

cat r-1407-VM.log 
2015-10-21 11:04:59.262+0000: starting up 
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none
/usr/bin/kvm-spice -name r-1407-VM -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 256 -realtime
mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 815d2860-cc7f-475d-bf63-02814c720fe4 -no-user-config
-nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r-1407-VM.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6
-drive file=rbd:Primary-ubuntu-1/c3f90fb4-c1a6-4e99-a2c0-64ae4517412e:id=admin:key=AQDiDbJR2GqPABAAWCcsUQ+UQwK8z9c6LWrizw==:auth_supported=cephx\;none:mon_host=ceph-mon.csprdc.arhont.com\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive file=/usr/share/cloudstack-common/vms/systemvm.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=54,id=hostnet0,vhost=on,vhostfd=55
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:2e:f7:00:18,bus=pci.0,addr=0x3,rombar=0,romfile=
-netdev tap,fd=56,id=hostnet1,vhost=on,vhostfd=57 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=0e:00:a9:fe:01:42,bus=pci.0,addr=0x4,rombar=0,romfile=
-netdev tap,fd=58,id=hostnet2,vhost=on,vhostfd=59 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=06:0c:b6:00:02:13,bus=pci.0,addr=0x5,rombar=0,romfile=
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/r-1407-VM.agent,server,nowait
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=r-1407-VM.vport
-device usb-tablet,id=input0 -vnc 192.168.169.2:10,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2

Domain id=42 is tainted: high-privileges 
libust[20136/20136]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user
tracing. (in setup_local_apps() at lttng-ust-comm.c:305) 
char device redirected to /dev/pts/13 (label charserial0) 
librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t librbd::LibrbdWriteback::write(const
object_t&, const object_locator_t&, uint64_t, uint64_t, const SnapContext&, const
bufferlist&, utime_t, uint64_t, __u32, Context*)' thread 7ffa6b7fe700 time 2015-10-21
12:05:07.901876 
librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked()) 
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a) 
1: (()+0x17258b) [0x7ffa92ef758b] 
2: (()+0xa9573) [0x7ffa92e2e573] 
3: (()+0x3a90ca) [0x7ffa9312e0ca] 
4: (()+0x3b583d) [0x7ffa9313a83d] 
5: (()+0x7212c) [0x7ffa92df712c] 
6: (()+0x9590f) [0x7ffa92e1a90f] 
7: (()+0x969a3) [0x7ffa92e1b9a3] 
8: (()+0x4782a) [0x7ffa92dcc82a] 
9: (()+0x56599) [0x7ffa92ddb599] 
10: (()+0x7284e) [0x7ffa92df784e] 
11: (()+0x162b7e) [0x7ffa92ee7b7e] 
12: (()+0x163c10) [0x7ffa92ee8c10] 
13: (()+0x8182) [0x7ffa8ec49182] 
14: (clone()+0x6d) [0x7ffa8e97647d] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret
this. 
terminate called after throwing an instance of 'ceph::FailedAssertion' 
2015-10-21 11:05:08.091+0000: shutting down 



For some reason this only affects virtual routers and not CPVM/SSVM or the vm guests. Any
thoughts? 

I will post it to the ceph mailing list as well. perhaps someone there will have a clue. 

Thanks 


----- Original Message ----- 

From: "Andrei Mikhailovsky" <andrei@arhont.com> 
To: users@cloudstack.apache.org 
Sent: Wednesday, 21 October, 2015 12:25:28 PM 
Subject: Re: KVM - No longer able to start virtual routers 

I have also forgot to mention that the Console Proxy and SSVM virtual machines can be successfully
created. I've removed them and they were recreated by ACS without any issues. It's the virtual
routers which are not playing well. 

This tells me that there is no issue with the systemvm template or the storage servers. 

Andrei 



----- Original Message ----- 

From: "Andrei Mikhailovsky" <andrei@arhont.com> 
To: users@cloudstack.apache.org 
Sent: Wednesday, 21 October, 2015 11:36:28 AM 
Subject: KVM - No longer able to start virtual routers 

Hello guys, 

I have recently upgraded from ACS 4.5.1 to 4.5.2. The upgrade went well as far as I can tell,
no error messages. As there was no need to update the systemvm templates, i've not bothered
to reboot the system vms and virtual routers. 

I am running Ubuntu 14.04 with KVM hosts and using nfs for secondary and ceph rbd for primary
storage. 

Today i've tried to restart one of the networks with the clean up option and noticed that
the restart has failed. After a bit of poking around i've identified that ACS is no longer
able to create virtual routers. ACS just showing them with Starting status, but they are not
starting on the host server. I've looked at the host server, which is suppose to be starting
that virtual router and there is no sign of the domain being created. The host server has
the following entries in the agent.log file: 

2015-10-21 10:39:02,140 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null)
Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl -n r-1405-VM
-p %template=domP%name=r-1405-VM%eth2ip=XXX.XXX.XXX.53%eth2mask=255.255.255.128%gateway=XXX.XXX.XXX.1%eth0ip=10.1.1.1%eth0mask=255.255.255.0%domain=kaspersky.local%cidrsize=24%dhcprange=10.1.1.1%eth1ip=169.254.3.232%eth1mask=255.255.0.0%type=router%disable_rp_filter=true%dns1=178.248.108.130%dns2=91.224.1.152

2015-10-21 10:39:02,172 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null)
Exit value is 111 
2015-10-21 10:39:02,173 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null)
ERROR: unable to connect to /var/lib/libvirt/qemu/r-1405-VM.agent - Connection refused 
2015-10-21 10:39:02,174 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null)
passcmd failed:ERROR: unable to connect to /var/lib/libvirt/qemu/r-1405-VM.agent - Connection
refused 



I've checked ssvm and it seems to be running perfectly well. The /usr/local/cloud/systemvm/ssvm-check.sh
script produces no warnings or errors, the nfs mount point is mounted and writable. 

I've also tried to create a new vm with a new network and that vm is not starting because
ACS is unable to start the virtual router for the network. 


Any idea how to get this issue resolved? 

Thanks 





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message