cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Klein (RSI)" <rkl...@rsitex.com>
Subject RE: Primary storage not mounted on hosts?
Date Sat, 16 Apr 2016 17:17:50 GMT
Thanks for the advice.  I found the problem and got it resolved.  During the agent (with debug
enabled per you suggestion) did a tail/grep using the UUID of primary storage and discovered
that during the mount/add to libvrt process it was getting an I/O error on a UUID of a QCOW2
volume.  Below is a snippet form the tail/grep.  So I stopped the agent, mounted primary storage
manually and tried to copy that file from the log.  Sure enough I got an IO error.  I then
copied some other random small files and they were OK so it appeared that this one volume
was corrupt.

I looked up the volume UUID in the volumes table and found the instance it belonged to which
was a stopped VR.  I destroyed the VR and started the agent.  I still got the IO error because
the volume was still there (probably hadn't gone thru the expunge process yet).  I stopped
the agent, manually moved the file to a temp directory and then started the agent.  Everything
worked normally then.  It added the primary storage and started to turn on VRs.  I then restarted
the agents on all hosts and all started working again.

It behaved as if during the process of adding the pool to libvirt all of the volumes are examined
to get information about it I suppose.  Because this one volume was corrupt that prevented
the pool from being added.  At least that is my theory.

I do still have one problem.  The system VMs are stuck in a starting state.  I think due to
timing of the agent restarts.  When I look on the host they are "starting" on I don't see
them with the "virsh list" command.  I am going to give them time just in case it's a work
load issue but if they are still starting after an hour or so I will probably change the database
status for them to stop, then recreated them again.

Thanks for the help!

Here is the agent log snippet:
----
tail -f /var/log/cloudstack/agent/agent.log | grep "c3991ea2\-b702\-3b1b\-bfc5\-69cb7d928554"
2016-04-16 10:43:00,245 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:30562dd3)
Request:Seq 46-5281314988022038529:  { Cmd , MgmtId: 345049993464, via: 46, Ver: v1, Flags:
100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":{"id":5,"uuid":"c3991ea2-b702-3b1b-bfc5-69cb7d928554","host":"gv0cl1.pod1.aus1.centex.rsitex.com","path":"/gv0cl1","port":24007,"type":"Gluster"},"localPath":"/mnt//c3991ea2-b702-3b1b-bfc5-69cb7d928554","wait":0}}]
}
2016-04-16 10:43:00,318 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:null)
(logid:30562dd3) Attempting to create storage pool c3991ea2-b702-3b1b-bfc5-69cb7d928554 (Gluster)
in libvirt
2016-04-16 10:43:00,322 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:null)
(logid:30562dd3) Storage pool c3991ea2-b702-3b1b-bfc5-69cb7d928554 was not found running in
libvirt. Need to create it.
2016-04-16 10:43:00,322 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:null)
(logid:30562dd3) Didn't find an existing storage pool c3991ea2-b702-3b1b-bfc5-69cb7d928554
by UUID, checking for pools with duplicate paths
2016-04-16 10:43:00,325 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:null)
(logid:30562dd3) Attempting to create storage pool c3991ea2-b702-3b1b-bfc5-69cb7d928554
<name>c3991ea2-b702-3b1b-bfc5-69cb7d928554</name>
<uuid>c3991ea2-b702-3b1b-bfc5-69cb7d928554</uuid>
<path>/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554</path>
2016-04-16 10:43:00,775 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:null)
(logid:30562dd3) org.libvirt.LibvirtException: cannot read header '/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554/52a6130f-266e-4667-8ca2-89f932a0b254':
Input/output error
org.libvirt.LibvirtException: cannot read header '/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554/52a6130f-266e-4667-8ca2-89f932a0b254':
Input/output error

----

Richard Klein  <rklein@rsitex.com> 
RSI 
5426 Guadalupe, Suite 100 
Austin TX 78751 
RSI Help Desk:  (512) 334-3334 
Phone:  (512) 275-0358 
Fax:  (512)  328-3410






> -----Original Message-----
> From: Simon Weller [mailto:sweller@ena.com]
> Sent: Friday, April 15, 2016 8:47 PM
> To: users@cloudstack.apache.org
> Subject: Re: Primary storage not mounted on hosts?
> 
> Richard,
> 
> The Cloudstack-agent should populate the libvirt pool-list when it starts up.
> Have you tried restarting libvirtd and then restarting the Cloudstack-agent?
> 
> You may want to turn up debugging on the agent so you get some more detail
> on what's going on.
> You can do this by modifying /etc/cloudstack/agent/log4j-cloud.xml
> See this wiki article for more details:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+agent+debug
> 
> - Si
> 
> ________________________________________
> From: Richard Klein (RSI) <rklein@rsitex.com>
> Sent: Friday, April 15, 2016 6:54 PM
> To: users@cloudstack.apache.org
> Subject: Primary storage not mounted on hosts?
> 
> I am not sure what happened but our primary storage, which is Gluster, on all
> our hosts is not mounted anymore.  When I do "virsh pool-list" on any host I
> only see the local pool.  Gluster is working fine and there are no problems with
> it because I can mount the Gluster volume manually on any of the hosts and
> see the primary storage.  Instances that are running can write data to the local
> volume and pull data from it.  But if a VM is stopped it can't start again.  I get
> the "Unable to create a New VM - Error message: Unable to start instance due
> to Unable to get answer that is of class com.cloud.agent.api.StartAnswer" that I
> have seen a thread in this mailing list and I am sure its primary storage related.
> 
> The agent logs on the hosts are issuing the following log snippets which
> confirm its looking for primary storage:
> 
> 2016-04-15 18:42:34,838 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:ad8ec05a) Trying to fetch storage pool
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
> 2016-04-15 18:45:19,006 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage pool
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
> 2016-04-15 18:45:49,010 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage pool
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
> 
> The c3991ea2-b702-3b1b-bfc5-69cb7d928554 is the UUID of our primary
> storage.
> 
> We did have some secondary storage issues (NFS) that caused some NFS
> mounts to secondary storage to hang.  The only way to recover was to reboot
> the host.  There were 2 host affected so I put each host in maintenance mode,
> rebooted and then canceled maintenance mode.  I did this one host at a time.
> It seems like ever since this has happened I have had issues.
> 
> Is there a way to get the primary storage remounted and added to libvirt pool-
> list while keeping the VMs up and running?  At this point the only idea I have to
> recover is to power off all VMs, disable primary storage then enable it again.
> This is a little extreme and is a last resort but I don't know what other options I
> have.
> 
> Any suggestions?
> 
> 
> Richard Klein  <rklein@rsitex.com>
> RSI
> 5426 Guadalupe, Suite 100
> Austin TX 78751
> 


Mime
View raw message