Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@cloudstack.apache.org
From: "Richard Klein (RSI)" <rklein@rsitex.com>
To: "users@cloudstack.apache.org" <users@cloudstack.apache.org>
Date: Sat, 16 Apr 2016 12:17:50 -0500
Subject: RE: Primary storage not mounted on hosts?
Thread-Topic: Primary storage not mounted on hosts?
Thread-Index: AdGXchm4B/cLx7heSgyChbbfg+DzCQADnE8gAB/vg0A=
Message-ID: <D81F3C262184034786211E9811567A9BAD97503BF8@RSIEXCH.RSITEX.COM>
References: <D81F3C262184034786211E9811567A9BAD97503BF5@RSIEXCH.RSITEX.COM>
 <SN1PR02MB2016136607F33535199EF7E0A9690@SN1PR02MB2016.namprd02.prod.outlook.com>
In-Reply-To: 
 <SN1PR02MB2016136607F33535199EF7E0A9690@SN1PR02MB2016.namprd02.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Thanks for the advice.  I found the problem and got it resolved.  During th=
e agent (with debug enabled per you suggestion) did a tail/grep using the U=
UID of primary storage and discovered that during the mount/add to libvrt p=
rocess it was getting an I/O error on a UUID of a QCOW2 volume.  Below is a=
 snippet form the tail/grep.  So I stopped the agent, mounted primary stora=
ge manually and tried to copy that file from the log.  Sure enough I got an=
 IO error.  I then copied some other random small files and they were OK so=
 it appeared that this one volume was corrupt.

I looked up the volume UUID in the volumes table and found the instance it =
belonged to which was a stopped VR.  I destroyed the VR and started the age=
nt.  I still got the IO error because the volume was still there (probably =
hadn't gone thru the expunge process yet).  I stopped the agent, manually m=
oved the file to a temp directory and then started the agent.  Everything w=
orked normally then.  It added the primary storage and started to turn on V=
Rs.  I then restarted the agents on all hosts and all started working again=
.

It behaved as if during the process of adding the pool to libvirt all of th=
e volumes are examined to get information about it I suppose.  Because this=
 one volume was corrupt that prevented the pool from being added.  At least=
 that is my theory.

I do still have one problem.  The system VMs are stuck in a starting state.=
  I think due to timing of the agent restarts.  When I look on the host the=
y are "starting" on I don't see them with the "virsh list" command.  I am g=
oing to give them time just in case it's a work load issue but if they are =
still starting after an hour or so I will probably change the database stat=
us for them to stop, then recreated them again.

Thanks for the help!

Here is the agent log snippet:
----
tail -f /var/log/cloudstack/agent/agent.log | grep "c3991ea2\-b702\-3b1b\-b=
fc5\-69cb7d928554"
2016-04-16 10:43:00,245 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:n=
ull) (logid:30562dd3) Request:Seq 46-5281314988022038529:  { Cmd , MgmtId: =
345049993464, via: 46, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.Modif=
yStoragePoolCommand":{"add":true,"pool":{"id":5,"uuid":"c3991ea2-b702-3b1b-=
bfc5-69cb7d928554","host":"gv0cl1.pod1.aus1.centex.rsitex.com","path":"/gv0=
cl1","port":24007,"type":"Gluster"},"localPath":"/mnt//c3991ea2-b702-3b1b-b=
fc5-69cb7d928554","wait":0}}] }
2016-04-16 10:43:00,318 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentReq=
uest-Handler-1:null) (logid:30562dd3) Attempting to create storage pool c39=
91ea2-b702-3b1b-bfc5-69cb7d928554 (Gluster) in libvirt
2016-04-16 10:43:00,322 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentReq=
uest-Handler-1:null) (logid:30562dd3) Storage pool c3991ea2-b702-3b1b-bfc5-=
69cb7d928554 was not found running in libvirt. Need to create it.
2016-04-16 10:43:00,322 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentReq=
uest-Handler-1:null) (logid:30562dd3) Didn't find an existing storage pool =
c3991ea2-b702-3b1b-bfc5-69cb7d928554 by UUID, checking for pools with dupli=
cate paths
2016-04-16 10:43:00,325 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentReq=
uest-Handler-1:null) (logid:30562dd3) Attempting to create storage pool c39=
91ea2-b702-3b1b-bfc5-69cb7d928554
<name>c3991ea2-b702-3b1b-bfc5-69cb7d928554</name>
<uuid>c3991ea2-b702-3b1b-bfc5-69cb7d928554</uuid>
<path>/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554</path>
2016-04-16 10:43:00,775 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentReq=
uest-Handler-1:null) (logid:30562dd3) org.libvirt.LibvirtException: cannot =
read header '/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554/52a6130f-266e-4667-8=
ca2-89f932a0b254': Input/output error
org.libvirt.LibvirtException: cannot read header '/mnt/c3991ea2-b702-3b1b-b=
fc5-69cb7d928554/52a6130f-266e-4667-8ca2-89f932a0b254': Input/output error

----

Richard Klein=A0 <rklein@rsitex.com>=20
RSI=20
5426 Guadalupe, Suite 100=20
Austin TX 78751=20
RSI Help Desk:=A0 (512) 334-3334=20
Phone:=A0 (512) 275-0358=20
Fax:=A0 (512)=A0 328-3410


> -----Original Message-----
> From: Simon Weller [mailto:sweller@ena.com]
> Sent: Friday, April 15, 2016 8:47 PM
> To: users@cloudstack.apache.org
> Subject: Re: Primary storage not mounted on hosts?
>=20
> Richard,
>=20
> The Cloudstack-agent should populate the libvirt pool-list when it starts=
 up.
> Have you tried restarting libvirtd and then restarting the Cloudstack-age=
nt?
>=20
> You may want to turn up debugging on the agent so you get some more detai=
l
> on what's going on.
> You can do this by modifying /etc/cloudstack/agent/log4j-cloud.xml
> See this wiki article for more details:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+agent+debug
>=20
> - Si
>=20
> ________________________________________
> From: Richard Klein (RSI) <rklein@rsitex.com>
> Sent: Friday, April 15, 2016 6:54 PM
> To: users@cloudstack.apache.org
> Subject: Primary storage not mounted on hosts?
>=20
> I am not sure what happened but our primary storage, which is Gluster, on=
 all
> our hosts is not mounted anymore.  When I do "virsh pool-list" on any hos=
t I
> only see the local pool.  Gluster is working fine and there are no proble=
ms with
> it because I can mount the Gluster volume manually on any of the hosts an=
d
> see the primary storage.  Instances that are running can write data to th=
e local
> volume and pull data from it.  But if a VM is stopped it can't start agai=
n.  I get
> the "Unable to create a New VM - Error message: Unable to start instance =
due
> to Unable to get answer that is of class com.cloud.agent.api.StartAnswer"=
 that I
> have seen a thread in this mailing list and I am sure its primary storage=
 related.
>=20
> The agent logs on the hosts are issuing the following log snippets which
> confirm its looking for primary storage:
>=20
> 2016-04-15 18:42:34,838 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:ad8ec05a) Trying to fetch storage po=
ol
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
> 2016-04-15 18:45:19,006 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage po=
ol
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
> 2016-04-15 18:45:49,010 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage po=
ol
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
>=20
> The c3991ea2-b702-3b1b-bfc5-69cb7d928554 is the UUID of our primary
> storage.
>=20
> We did have some secondary storage issues (NFS) that caused some NFS
> mounts to secondary storage to hang.  The only way to recover was to rebo=
ot
> the host.  There were 2 host affected so I put each host in maintenance m=
ode,
> rebooted and then canceled maintenance mode.  I did this one host at a ti=
me.
> It seems like ever since this has happened I have had issues.
>=20
> Is there a way to get the primary storage remounted and added to libvirt =
pool-
> list while keeping the VMs up and running?  At this point the only idea I=
 have to
> recover is to power off all VMs, disable primary storage then enable it a=
gain.
> This is a little extreme and is a last resort but I don't know what other=
 options I
> have.
>=20
> Any suggestions?
>=20
>=20
> Richard Klein  <rklein@rsitex.com>
> RSI
> 5426 Guadalupe, Suite 100
> Austin TX 78751
>=20