Return-Path: X-Original-To: apmail-cloudstack-users-archive@www.apache.org Delivered-To: apmail-cloudstack-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CC0517B9D for ; Wed, 21 Oct 2015 22:10:02 +0000 (UTC) Received: (qmail 78277 invoked by uid 500); 21 Oct 2015 22:10:00 -0000 Delivered-To: apmail-cloudstack-users-archive@cloudstack.apache.org Received: (qmail 78232 invoked by uid 500); 21 Oct 2015 22:10:00 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 78208 invoked by uid 99); 21 Oct 2015 22:10:00 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Oct 2015 22:10:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0BB24C0FC5 for ; Wed, 21 Oct 2015 22:10:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.99 X-Spam-Level: ** X-Spam-Status: No, score=2.99 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id gnW9mh2ZhYKz for ; Wed, 21 Oct 2015 22:09:47 +0000 (UTC) Received: from mail.arhont.com (mail1.arhont.com [178.248.108.132]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 84C9027438 for ; Wed, 21 Oct 2015 22:09:41 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail1.arhont.com (Postfix) with ESMTP id 02F719807D0 for ; Wed, 21 Oct 2015 23:09:33 +0100 (BST) Received: from mail.arhont.com ([127.0.0.1]) by localhost (mail1.arhont.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id V4wf2t4qs6Ey for ; Wed, 21 Oct 2015 23:09:30 +0100 (BST) Received: from localhost (localhost [127.0.0.1]) by mail1.arhont.com (Postfix) with ESMTP id CDD509807D2 for ; Wed, 21 Oct 2015 23:09:30 +0100 (BST) X-Virus-Scanned: amavisd-new at arhont.com Received: from mail.arhont.com ([127.0.0.1]) by localhost (mail1.arhont.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id uOQjcyWNsYg5 for ; Wed, 21 Oct 2015 23:09:30 +0100 (BST) Received: from mail1.arhont.com (mail1.arhont.com [178.248.108.132]) by mail1.arhont.com (Postfix) with ESMTP id 779189807D0 for ; Wed, 21 Oct 2015 23:09:30 +0100 (BST) Date: Wed, 21 Oct 2015 23:09:29 +0100 (BST) From: Andrei Mikhailovsky To: users@cloudstack.apache.org Message-ID: <30270268.2011.1445465372317.JavaMail.andrei@tuchka> In-Reply-To: References: <569463.1450.1445423791085.JavaMail.andrei@tuchka> <10580077.1502.1445426734007.JavaMail.andrei@tuchka> <10242977.1558.1445428520619.JavaMail.andrei@tuchka> Subject: Re: KVM - No longer able to start virtual routers MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2010_32002990.1445465372317" X-Mailer: Zimbra 8.6.0_GA_1178 (Zimbra Desktop/7.2.7_12059_Linux) Thread-Topic: KVM - No longer able to start virtual routers Thread-Index: LnDZpUm+HLkUMYmAXRcOpzEaLTF1fk5LKYQ8fSp1ohu6CSeH2UU3EHOE ------=_Part_2010_32002990.1445465372317 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Simon,=20 Chatting with the devs at ceph we've managed to isolate the problem. It see= ms that the changes introduced in 0.94.4 version of ceph broke the ability = to start vms which were created from a template by cloning. This affects vi= rtual routers, but I am not sure why it didn't make any effect on the cpvm = and ssvms. Anyway, they've suggested that I downgrade ceph to 0.94.3 on the= client host servers, but not osd/mon servers. I've done this and ACS is ba= ck in a happy mood and all is working again.=20 Thanks for your help guys,=20 Andrei=20 ----- Original Message ----- From: "Simon Weller" =20 To: users@cloudstack.apache.org=20 Sent: Wednesday, 21 October, 2015 3:50:32 PM=20 Subject: Re: KVM - No longer able to start virtual routers=20 Andrei,=20 Reading between the lines here...but based on the logs it looks to be relat= ed to locking.=20 Have you tried running rdb lock list command on the block de= vice that libvirt is trying to use?=20 You may need to clear the lock using rdb lock remove .=20 - Si=20 ________________________________________=20 From: Andrei Mikhailovsky =20 Sent: Wednesday, October 21, 2015 6:55 AM=20 To: users@cloudstack.apache.org=20 Subject: Re: KVM - No longer able to start virtual routers=20 Right, a bit more digging around and the issue seem to relate to the ceph s= torage. Here is the log from libvirt:=20 cat r-1407-VM.log=20 2015-10-21 11:04:59.262+0000: starting up=20 LC_ALL=3DC PATH=3D/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/= bin QEMU_AUDIO_DRV=3Dnone /usr/bin/kvm-spice -name r-1407-VM -S -machine pc= -i440fx-trusty,accel=3Dkvm,usb=3Doff -m 256 -realtime mlock=3Doff -smp 1,so= ckets=3D1,cores=3D1,threads=3D1 -uuid 815d2860-cc7f-475d-bf63-02814c720fe4 = -no-user-config -nodefaults -chardev socket,id=3Dcharmonitor,path=3D/var/li= b/libvirt/qemu/r-1407-VM.monitor,server,nowait -mon chardev=3Dcharmonitor,i= d=3Dmonitor,mode=3Dcontrol -rtc base=3Dutc -no-shutdown -boot strict=3Don -= device piix3-usb-uhci,id=3Dusb,bus=3Dpci.0,addr=3D0x1.0x2 -device virtio-se= rial-pci,id=3Dvirtio-serial0,bus=3Dpci.0,addr=3D0x6 -drive file=3Drbd:Prima= ry-ubuntu-1/c3f90fb4-c1a6-4e99-a2c0-64ae4517412e:id=3Dadmin:key=3DAQDiDbJR2= GqPABAAWCcsUQ+UQwK8z9c6LWrizw=3D=3D:auth_supported=3Dcephx\;none:mon_host= =3Dceph-mon.csprdc.arhont.com\:6789,if=3Dnone,id=3Ddrive-virtio-disk0,forma= t=3Draw,cache=3Dnone -device virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0= x7,drive=3Ddrive-virtio-disk0,id=3Dvirtio-disk0,bootindex=3D2 -drive file= =3D/usr/share/cloudstack-common/vms/systemvm.iso,if=3Dnone,id=3Ddrive-ide0-= 1-0,readonly=3Don,format=3Draw,cache=3Dnone -device ide-cd,bus=3Dide.1,unit= =3D0,drive=3Ddrive-ide0-1-0,id=3Dide0-1-0,bootindex=3D1 -netdev tap,fd=3D54= ,id=3Dhostnet0,vhost=3Don,vhostfd=3D55 -device virtio-net-pci,netdev=3Dhost= net0,id=3Dnet0,mac=3D02:00:2e:f7:00:18,bus=3Dpci.0,addr=3D0x3,rombar=3D0,ro= mfile=3D -netdev tap,fd=3D56,id=3Dhostnet1,vhost=3Don,vhostfd=3D57 -device = virtio-net-pci,netdev=3Dhostnet1,id=3Dnet1,mac=3D0e:00:a9:fe:01:42,bus=3Dpc= i.0,addr=3D0x4,rombar=3D0,romfile=3D -netdev tap,fd=3D58,id=3Dhostnet2,vhos= t=3Don,vhostfd=3D59 -device virtio-net-pci,netdev=3Dhostnet2,id=3Dnet2,mac= =3D06:0c:b6:00:02:13,bus=3Dpci.0,addr=3D0x5,rombar=3D0,romfile=3D -chardev = pty,id=3Dcharserial0 -device isa-serial,chardev=3Dcharserial0,id=3Dserial0 = -chardev socket,id=3Dcharchannel0,path=3D/var/lib/libvirt/qemu/r-1407-VM.ag= ent,server,nowait -device virtserialport,bus=3Dvirtio-serial0.0,nr=3D1,char= dev=3Dcharchannel0,id=3Dchannel0,name=3Dr-1407-VM.vport -device usb-tablet,= id=3Dinput0 -vnc 192.168.169.2:10,password -device cirrus-vga,id=3Dvideo0,b= us=3Dpci.0,addr=3D0x2=20 Domain id=3D42 is tainted: high-privileges=20 libust[20136/20136]: Warning: HOME environment variable not set. Disabling = LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305)= =20 char device redirected to /dev/pts/13 (label charserial0)=20 librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t librbd::LibrbdWr= iteback::write(const object_t&, const object_locator_t&, uint64_t, uint64_t= , const SnapContext&, const bufferlist&, utime_t, uint64_t, __u32, Context*= )' thread 7ffa6b7fe700 time 2015-10-21 12:05:07.901876=20 librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked(= ))=20 ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)=20 1: (()+0x17258b) [0x7ffa92ef758b]=20 2: (()+0xa9573) [0x7ffa92e2e573]=20 3: (()+0x3a90ca) [0x7ffa9312e0ca]=20 4: (()+0x3b583d) [0x7ffa9313a83d]=20 5: (()+0x7212c) [0x7ffa92df712c]=20 6: (()+0x9590f) [0x7ffa92e1a90f]=20 7: (()+0x969a3) [0x7ffa92e1b9a3]=20 8: (()+0x4782a) [0x7ffa92dcc82a]=20 9: (()+0x56599) [0x7ffa92ddb599]=20 10: (()+0x7284e) [0x7ffa92df784e]=20 11: (()+0x162b7e) [0x7ffa92ee7b7e]=20 12: (()+0x163c10) [0x7ffa92ee8c10]=20 13: (()+0x8182) [0x7ffa8ec49182]=20 14: (clone()+0x6d) [0x7ffa8e97647d]=20 NOTE: a copy of the executable, or `objdump -rdS ` is needed to= interpret this.=20 terminate called after throwing an instance of 'ceph::FailedAssertion'=20 2015-10-21 11:05:08.091+0000: shutting down=20 For some reason this only affects virtual routers and not CPVM/SSVM or the = vm guests. Any thoughts?=20 I will post it to the ceph mailing list as well. perhaps someone there will= have a clue.=20 Thanks=20 ----- Original Message -----=20 From: "Andrei Mikhailovsky" =20 To: users@cloudstack.apache.org=20 Sent: Wednesday, 21 October, 2015 12:25:28 PM=20 Subject: Re: KVM - No longer able to start virtual routers=20 I have also forgot to mention that the Console Proxy and SSVM virtual machi= nes can be successfully created. I've removed them and they were recreated = by ACS without any issues. It's the virtual routers which are not playing w= ell.=20 This tells me that there is no issue with the systemvm template or the stor= age servers.=20 Andrei=20 ----- Original Message -----=20 From: "Andrei Mikhailovsky" =20 To: users@cloudstack.apache.org=20 Sent: Wednesday, 21 October, 2015 11:36:28 AM=20 Subject: KVM - No longer able to start virtual routers=20 Hello guys,=20 I have recently upgraded from ACS 4.5.1 to 4.5.2. The upgrade went well as = far as I can tell, no error messages. As there was no need to update the sy= stemvm templates, i've not bothered to reboot the system vms and virtual ro= uters.=20 I am running Ubuntu 14.04 with KVM hosts and using nfs for secondary and ce= ph rbd for primary storage.=20 Today i've tried to restart one of the networks with the clean up option an= d noticed that the restart has failed. After a bit of poking around i've id= entified that ACS is no longer able to create virtual routers. ACS just sho= wing them with Starting status, but they are not starting on the host serve= r. I've looked at the host server, which is suppose to be starting that vir= tual router and there is no sign of the domain being created. The host serv= er has the following entries in the agent.log file:=20 2015-10-21 10:39:02,140 DEBUG [kvm.resource.LibvirtComputingResource] (agen= tRequest-Handler-3:null) Executing: /usr/share/cloudstack-common/scripts/vm= /hypervisor/kvm/patchviasocket.pl -n r-1405-VM -p %template=3DdomP%name=3Dr= -1405-VM%eth2ip=3DXXX.XXX.XXX.53%eth2mask=3D255.255.255.128%gateway=3DXXX.X= XX.XXX.1%eth0ip=3D10.1.1.1%eth0mask=3D255.255.255.0%domain=3Dkaspersky.loca= l%cidrsize=3D24%dhcprange=3D10.1.1.1%eth1ip=3D169.254.3.232%eth1mask=3D255.= 255.0.0%type=3Drouter%disable_rp_filter=3Dtrue%dns1=3D178.248.108.130%dns2= =3D91.224.1.152=20 2015-10-21 10:39:02,172 DEBUG [kvm.resource.LibvirtComputingResource] (agen= tRequest-Handler-3:null) Exit value is 111=20 2015-10-21 10:39:02,173 DEBUG [kvm.resource.LibvirtComputingResource] (agen= tRequest-Handler-3:null) ERROR: unable to connect to /var/lib/libvirt/qemu/= r-1405-VM.agent - Connection refused=20 2015-10-21 10:39:02,174 DEBUG [kvm.resource.LibvirtComputingResource] (agen= tRequest-Handler-3:null) passcmd failed:ERROR: unable to connect to /var/li= b/libvirt/qemu/r-1405-VM.agent - Connection refused=20 I've checked ssvm and it seems to be running perfectly well. The /usr/local= /cloud/systemvm/ssvm-check.sh script produces no warnings or errors, the nf= s mount point is mounted and writable.=20 I've also tried to create a new vm with a new network and that vm is not st= arting because ACS is unable to start the virtual router for the network.= =20 Any idea how to get this issue resolved?=20 Thanks=20 ------=_Part_2010_32002990.1445465372317--