Return-Path: X-Original-To: apmail-cloudstack-users-archive@www.apache.org Delivered-To: apmail-cloudstack-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 115851944E for ; Sat, 16 Apr 2016 17:18:04 +0000 (UTC) Received: (qmail 64352 invoked by uid 500); 16 Apr 2016 17:18:03 -0000 Delivered-To: apmail-cloudstack-users-archive@cloudstack.apache.org Received: (qmail 64295 invoked by uid 500); 16 Apr 2016 17:18:03 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 64283 invoked by uid 99); 16 Apr 2016 17:18:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Apr 2016 17:18:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A5C36C1925 for ; Sat, 16 Apr 2016 17:18:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.004 X-Spam-Level: X-Spam-Status: No, score=0.004 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.996] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id sI6Fc57VANJV for ; Sat, 16 Apr 2016 17:18:00 +0000 (UTC) Received: from exchange.rsitex.com (exchange.rsitex.com [66.194.167.6]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 452015F472 for ; Sat, 16 Apr 2016 17:17:59 +0000 (UTC) Received: from RSIEXCH.RSITEX.COM ([::1]) by RSIEXCH.RSITEX.COM ([::1]) with mapi; Sat, 16 Apr 2016 12:17:51 -0500 From: "Richard Klein (RSI)" To: "users@cloudstack.apache.org" Date: Sat, 16 Apr 2016 12:17:50 -0500 Subject: RE: Primary storage not mounted on hosts? Thread-Topic: Primary storage not mounted on hosts? Thread-Index: AdGXchm4B/cLx7heSgyChbbfg+DzCQADnE8gAB/vg0A= Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Thanks for the advice. I found the problem and got it resolved. During th= e agent (with debug enabled per you suggestion) did a tail/grep using the U= UID of primary storage and discovered that during the mount/add to libvrt p= rocess it was getting an I/O error on a UUID of a QCOW2 volume. Below is a= snippet form the tail/grep. So I stopped the agent, mounted primary stora= ge manually and tried to copy that file from the log. Sure enough I got an= IO error. I then copied some other random small files and they were OK so= it appeared that this one volume was corrupt. I looked up the volume UUID in the volumes table and found the instance it = belonged to which was a stopped VR. I destroyed the VR and started the age= nt. I still got the IO error because the volume was still there (probably = hadn't gone thru the expunge process yet). I stopped the agent, manually m= oved the file to a temp directory and then started the agent. Everything w= orked normally then. It added the primary storage and started to turn on V= Rs. I then restarted the agents on all hosts and all started working again= . It behaved as if during the process of adding the pool to libvirt all of th= e volumes are examined to get information about it I suppose. Because this= one volume was corrupt that prevented the pool from being added. At least= that is my theory. I do still have one problem. The system VMs are stuck in a starting state.= I think due to timing of the agent restarts. When I look on the host the= y are "starting" on I don't see them with the "virsh list" command. I am g= oing to give them time just in case it's a work load issue but if they are = still starting after an hour or so I will probably change the database stat= us for them to stop, then recreated them again. Thanks for the help! Here is the agent log snippet: ---- tail -f /var/log/cloudstack/agent/agent.log | grep "c3991ea2\-b702\-3b1b\-b= fc5\-69cb7d928554" 2016-04-16 10:43:00,245 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:n= ull) (logid:30562dd3) Request:Seq 46-5281314988022038529: { Cmd , MgmtId: = 345049993464, via: 46, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.Modif= yStoragePoolCommand":{"add":true,"pool":{"id":5,"uuid":"c3991ea2-b702-3b1b-= bfc5-69cb7d928554","host":"gv0cl1.pod1.aus1.centex.rsitex.com","path":"/gv0= cl1","port":24007,"type":"Gluster"},"localPath":"/mnt//c3991ea2-b702-3b1b-b= fc5-69cb7d928554","wait":0}}] } 2016-04-16 10:43:00,318 INFO [kvm.storage.LibvirtStorageAdaptor] (agentReq= uest-Handler-1:null) (logid:30562dd3) Attempting to create storage pool c39= 91ea2-b702-3b1b-bfc5-69cb7d928554 (Gluster) in libvirt 2016-04-16 10:43:00,322 WARN [kvm.storage.LibvirtStorageAdaptor] (agentReq= uest-Handler-1:null) (logid:30562dd3) Storage pool c3991ea2-b702-3b1b-bfc5-= 69cb7d928554 was not found running in libvirt. Need to create it. 2016-04-16 10:43:00,322 INFO [kvm.storage.LibvirtStorageAdaptor] (agentReq= uest-Handler-1:null) (logid:30562dd3) Didn't find an existing storage pool = c3991ea2-b702-3b1b-bfc5-69cb7d928554 by UUID, checking for pools with dupli= cate paths 2016-04-16 10:43:00,325 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentReq= uest-Handler-1:null) (logid:30562dd3) Attempting to create storage pool c39= 91ea2-b702-3b1b-bfc5-69cb7d928554 c3991ea2-b702-3b1b-bfc5-69cb7d928554 c3991ea2-b702-3b1b-bfc5-69cb7d928554 /mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554 2016-04-16 10:43:00,775 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentReq= uest-Handler-1:null) (logid:30562dd3) org.libvirt.LibvirtException: cannot = read header '/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554/52a6130f-266e-4667-8= ca2-89f932a0b254': Input/output error org.libvirt.LibvirtException: cannot read header '/mnt/c3991ea2-b702-3b1b-b= fc5-69cb7d928554/52a6130f-266e-4667-8ca2-89f932a0b254': Input/output error ---- Richard Klein=A0 =20 RSI=20 5426 Guadalupe, Suite 100=20 Austin TX 78751=20 RSI Help Desk:=A0 (512) 334-3334=20 Phone:=A0 (512) 275-0358=20 Fax:=A0 (512)=A0 328-3410 > -----Original Message----- > From: Simon Weller [mailto:sweller@ena.com] > Sent: Friday, April 15, 2016 8:47 PM > To: users@cloudstack.apache.org > Subject: Re: Primary storage not mounted on hosts? >=20 > Richard, >=20 > The Cloudstack-agent should populate the libvirt pool-list when it starts= up. > Have you tried restarting libvirtd and then restarting the Cloudstack-age= nt? >=20 > You may want to turn up debugging on the agent so you get some more detai= l > on what's going on. > You can do this by modifying /etc/cloudstack/agent/log4j-cloud.xml > See this wiki article for more details: > https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+agent+debug >=20 > - Si >=20 > ________________________________________ > From: Richard Klein (RSI) > Sent: Friday, April 15, 2016 6:54 PM > To: users@cloudstack.apache.org > Subject: Primary storage not mounted on hosts? >=20 > I am not sure what happened but our primary storage, which is Gluster, on= all > our hosts is not mounted anymore. When I do "virsh pool-list" on any hos= t I > only see the local pool. Gluster is working fine and there are no proble= ms with > it because I can mount the Gluster volume manually on any of the hosts an= d > see the primary storage. Instances that are running can write data to th= e local > volume and pull data from it. But if a VM is stopped it can't start agai= n. I get > the "Unable to create a New VM - Error message: Unable to start instance = due > to Unable to get answer that is of class com.cloud.agent.api.StartAnswer"= that I > have seen a thread in this mailing list and I am sure its primary storage= related. >=20 > The agent logs on the hosts are issuing the following log snippets which > confirm its looking for primary storage: >=20 > 2016-04-15 18:42:34,838 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-3:null) (logid:ad8ec05a) Trying to fetch storage po= ol > c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt > 2016-04-15 18:45:19,006 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage po= ol > c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt > 2016-04-15 18:45:49,010 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage po= ol > c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt >=20 > The c3991ea2-b702-3b1b-bfc5-69cb7d928554 is the UUID of our primary > storage. >=20 > We did have some secondary storage issues (NFS) that caused some NFS > mounts to secondary storage to hang. The only way to recover was to rebo= ot > the host. There were 2 host affected so I put each host in maintenance m= ode, > rebooted and then canceled maintenance mode. I did this one host at a ti= me. > It seems like ever since this has happened I have had issues. >=20 > Is there a way to get the primary storage remounted and added to libvirt = pool- > list while keeping the VMs up and running? At this point the only idea I= have to > recover is to power off all VMs, disable primary storage then enable it a= gain. > This is a little extreme and is a last resort but I don't know what other= options I > have. >=20 > Any suggestions? >=20 >=20 > Richard Klein > RSI > 5426 Guadalupe, Suite 100 > Austin TX 78751 >=20