cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirk Kosinski <kirkkosin...@gmail.com>
Subject Re: Help! After network outage, can't start System VMs; focused debug info attached
Date Wed, 18 Sep 2013 02:01:22 GMT
Hi, secondary storage is only mounted on an as-needed basis.  When a KVM
or XenServer host needs to do something on secondary storage, it will
mount the full path it needs (e.g. nfshost:/share/template/tmpl/2/123),
do what it needs to do, and unmount it.

The error seems to be that CloudStack is looking for and not finding a
volume (qcow2 disk) named "f23a16e7-b628-429e-83e1-698935588465" on the
NFS primary storage.  This file seems to be the system VM template.
Does this file exist or not?  I'd guess not, since CS says it can't find it.

Check the status of this volume in the template_spool_ref table:
SELECT * FROM template_spool_ref where local_path =
'f23a16e7-b628-429e-83e1-698935588465'\G

If it shows up in the database as download_state = DOWNLOADED but it
does not exist on primary storage, back up the cloud database, then
delete the row in template_spool_ref.  This should force CS to should
re-download it (i.e. copy it from secondary storage to primary again and
use it to deploy system VMs... and create a new entry for it in
template_spool_ref).

If it does exist on primary storage, maybe the file is corrupt.  Compare
the size and md5sum to the original on secondary storage.  Let us know
how it goes.

Best regards,
Kirk

On 09/17/2013 04:47 PM, Matt Foley wrote:
> Hi,
> I've now heard that this problem, of Cloudstack being messed up after
> interruption of the NFS shared storage access, is well known.  Does
> anyone have a fix or work-around?
> 
> Kirk, thanks for your help so far.
> Both the master and the host servers can mount both primary and
> secondary stores, and read and write them.  No permissions nor IP access
> seem broken.
> 
> I also checked the log levels on the hosts, and both FILE and com.cloud
> were already set to DEBUG.  I tried setting them to TRACE, but got no
> additional useful info.
> 
> On the host, I tried just restarting the cloudstack-agent service.  In
> the resulting logs, the following snippet occurs.  The best
> interpretation I can make of it is that "no storage vol with matching
> name 'f23a16e7-b628-429e-83e1-698935588465'' is the key issue, and that
> should relate to secondary storage, where the templates are stored.  But
> this uuid doesn't seem to be related to the actual secondary storage
> pool, whose uuid is b7fd7b11-c0f7-4717-8343-ff6fb9bff860.  The primary
> storage pool is uuid 9c6fd9a3-43e5-389a-9594-faecf178b4b9, and it seems
> to be properly automatically mounted on all hosts and the master.  
> 
> ** It concerns me that the secondary storage pool does NOT seem to be
> automatically mounted.  Is it supposed to be?  If not, how are the hosts
> supposed to find the templates, before a System Router VM can even be
> set up?
> 
> Below is the relevant host agent.log snippet, and also a dump of the
> storage_pool table from mysql.
> 
> Thanks in advance for any suggestions.
> --Matt
> 
> ======================
> 2013-09-17 15:26:46,012 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Processing command:
> com.cloud.agent.api.storage.CreateCommand
> 2013-09-17 15:26:46,050 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-4:null) Failed to create volume:
> com.cloud.utils.exception.CloudRuntimeException:
> org.libvirt.LibvirtException: Storage volume not found: no storage vol
> with matching name 'f23a16e7-b628-429e-83e1-698935588465'
> 2013-09-17 15:26:46,051 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-4:null) Seq 14-606340093:  { Ans: , MgmtId:
> 161340856362, via: 14, Ver: v1, Flags: 110,
> [{"storage.CreateAnswer":{"requestTemplateReload":false,"result":false,"details":"Exception:
> com.cloud.utils.exception.CloudRuntimeException\nMessage:
> org.libvirt.LibvirtException: Storage volume not found: no storage vol
> with matching name 'f23a16e7-b628-429e-83e1-698935588465'\nStack:
> com.cloud.utils.exception.CloudRuntimeException:
> org.libvirt.LibvirtException: Storage volume not found: no storage vol
> with matching name 'f23a16e7-b62\
> 8-429e-83e1-698935588465'\n\tat
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getVolume(LibvirtStorageAdaptor.java:90)\n\tat
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getPhysicalDisk(LibvirtStorageAdaptor.java:437)\n\tat
> com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.getPhysicalDisk(LibvirtStoragePool.java:123)\n\tat
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:1279)\n\tat
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1072)\n\tat
> com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat
> com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)\n\tat
> com.cloud.utils.nio.Task.run(Task.java:83)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat
> java.lang.Thread.run(Thread.java:679)\n","wait":0}}] }
> 2013-09-17 15:26:46,192 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-1:null) Request:Seq 14-606340094:  { Cmd , MgmtId:
> 161340856362, via: 14, Ver: v1, Flags: 100111,
> [{"storage.CreateCommand":{"volId":10510,"pool":{"id":201,"uuid":"9c6fd9a3-43e5-389a-9594-faecf178b4b9","host":"10.42.1.101","path":"/srv/nfs/eng/cs-primary","port":2049,"type":"NetworkFilesystem"},"diskCharacteristics":{"size":725811200,"tags":[],"type":"ROOT","name":"ROOT-10429","useLocalStorage":false,"recreatable":true,"diskOfferingId":7,"volumeId":10510,"hyperType":"KVM"},"templateUrl":"f23a16e7-b628-429e-83e1-698935588465","wait":0}}]
> }
> 2013-09-17 15:26:46,192 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-1:null) Processing command:
> com.cloud.agent.api.storage.CreateCommand
> 2013-09-17 15:26:46,228 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-1:null) Failed to create volume:
> com.cloud.utils.exception.CloudRuntimeException:
> org.libvirt.LibvirtException: Storage volume not found: no storage vol
> with matching name 'f23a16e7-b628-429e-83e1-698935588465'
> 2013-09-17 15:26:46,229 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-1:null) Seq 14-606340094:  { Ans: , MgmtId:
> 161340856362, via: 14, Ver: v1, Flags: 110,
> [{"storage.CreateAnswer":{"requestTemplateReload":false,"result":false,"details":"Exception:
> com.cloud.utils.exception.CloudRuntimeException\nMessage:
> org.libvirt.LibvirtException: Storage volume not found: no storage vol
> with matching name 'f23a16e7-b628-429e-83e1-698935588465'\nStack:
> com.cloud.utils.exception.CloudRuntimeException:
> org.libvirt.LibvirtException: Storage volume not found: no storage vol
> with matching name 'f23a16e7-b628-429e-83e1-698935588465'\n\tat
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getVolume(LibvirtStorageAdaptor.java:90)\n\tat
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getPhysicalDisk(LibvirtStorageAdaptor.java:437)\n\tat
> com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.getPhysicalDisk(LibvirtStoragePool.java:123)\n\tat
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:1279)\n\tat
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1072)\n\tat
> com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat
> com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)\n\tat
> com.cloud.utils.nio.Task.run(Task.java:83)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat
> java.lang.Thread.run(Thread.java:679)\n","wait":0}}] }
> 2013-09-17 15:26:46,271 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-2:null) Request:Seq 14-606340095:  { Cmd , MgmtId:
> 161340856362, via: 14, Ver: v1, Flags: 100111,
> [{"StopCommand":{"isProxy":false,"vmName":"v-10415-VM","wait":0}}] }
> 
> ======================
> 
> dump from mysql of the "storage_pool" table:
> 
> ======================
> --
> -- Table structure for table `storage_pool`
> --
> 
> DROP TABLE IF EXISTS `storage_pool`;
> /*!40101 SET @saved_cs_client     = @@character_set_client */;
> /*!40101 SET character_set_client = utf8 */;
> CREATE TABLE `storage_pool` (
>   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
>   `name` varchar(255) DEFAULT NULL COMMENT 'should be NOT NULL',
>   `uuid` varchar(255) DEFAULT NULL,
>   `pool_type` varchar(32) NOT NULL,
>   `port` int(10) unsigned NOT NULL,
>   `data_center_id` bigint(20) unsigned NOT NULL,
>   `pod_id` bigint(20) unsigned DEFAULT NULL,
>   `cluster_id` bigint(20) unsigned DEFAULT NULL COMMENT 'foreign key to
> cluster',
>   `available_bytes` bigint(20) unsigned DEFAULT NULL,
>   `capacity_bytes` bigint(20) unsigned DEFAULT NULL,
>   `host_address` varchar(255) NOT NULL COMMENT 'FQDN or IP of storage
> server',
>   `user_info` varchar(255) DEFAULT NULL COMMENT 'Authorization
> information for the storage pool. Used by network filesystems',
>   `path` varchar(255) NOT NULL COMMENT 'Filesystem path that is shared',
>   `created` datetime DEFAULT NULL COMMENT 'date the pool created',
>   `removed` datetime DEFAULT NULL COMMENT 'date removed if not null',
>   `update_time` datetime DEFAULT NULL,
>   `status` varchar(32) DEFAULT NULL,
>   `storage_provider_id` bigint(20) unsigned DEFAULT NULL,
>   `scope` varchar(255) DEFAULT NULL,
>   PRIMARY KEY (`id`),
>   UNIQUE KEY `id` (`id`),
>   UNIQUE KEY `id_2` (`id`),
>   UNIQUE KEY `uuid` (`uuid`),
>   KEY `i_storage_pool__pod_id` (`pod_id`),
>   KEY `fk_storage_pool__cluster_id` (`cluster_id`),
>   KEY `i_storage_pool__removed` (`removed`),
>   CONSTRAINT `fk_storage_pool__cluster_id` FOREIGN KEY (`cluster_id`)
> REFERENCES `cluster` (`id`),
>   CONSTRAINT `fk_storage_pool__pod_id` FOREIGN KEY (`pod_id`) REFERENCES
> `host_pod_ref` (`id`) ON DELETE CASCADE
> ) ENGINE=InnoDB AUTO_INCREMENT=247 DEFAULT CHARSET=utf8;
> /*!40101 SET character_set_client = @saved_cs_client */;
> 
> --
> -- Dumping data for table `storage_pool`
> --
> 
> LOCK TABLES `storage_pool` WRITE;
> /*!40000 ALTER TABLE `storage_pool` DISABLE KEYS */;
> INSERT INTO `storage_pool` VALUES
> (201,'cs-primary','9c6fd9a3-43e5-389a-9594-faecf178b4b9','NetworkFilesystem',2049,1,1,1,1552364339200,20916432011264,'10.42.1.101',NULL,'/srv\
> /nfs/eng/cs-primary','2013-06-07
> 08:40:58',NULL,NULL,'Up',NULL,NULL),(205,'cn005','48ef7eec-1e42-4ffa-9182-303c8c8883b4','Filesystem',0,1,1,1,4964460785664,5270660358144,'172.\
> 18.128.5',NULL,'/var/lib/libvirt/images/','2013-06-09
> 20:44:10',NULL,NULL,'Up',NULL,NULL),(207,'cn004-10',NULL,'Filesystem',0,1,1,NULL,8117739520,8487899136,'172.18.128.4',NUL\
> L,'/var/lib/libvirt/images/','2013-06-10 06:17:53','2013-06-11
> 21:52:54',NULL,'Maintenance',NULL,NULL),(210,'cn004_grid',NULL,'NetworkFilesystem',2049,1,1,1,1645268992,4868214\
> 7840,'172.18.128.4',NULL,'/grid/1/cloudstack_store','2013-06-10
> 21:48:42','2013-06-20
> 08:53:15',NULL,'Maintenance',NULL,NULL),(215,'cn007','65aab404-6915-44fc-9a5e-c156b663ea67','Filesystem',0,1,1,1,4984320176128,5247872114688,'172.18.128.7',NULL,'/var/lib/libvirt/images/','2013-06-11
> 15:36:11',NULL,NULL,'Up',NULL,NULL),(216,'cn004-10','dfe2fa90-70fc-4d87-a314-0c7eab429d08','Filesystem',0,1,1,1,5270461812736,5270660358144,'172.18.128.4',NULL,'/var/lib/libvirt/images/','2013-06-11
> 21:54:44',NULL,NULL,'Up',NULL,NULL),(217,'cn003-10','3ea2c222-98fe-4ba9-a83c-c6d12eed1186','Filesystem',0,1,1,1,5232745308160,5270660358144,'172.18.128.3',NULL,'/var/lib/libvirt/images/','2013-06-11
> 22:03:17',NULL,NULL,'Up',NULL,NULL),(218,'cn008','52fd1e05-5153-4e16-94e9-7c851855a3fb','Filesystem',0,1,1,1,5073231945728,5270660358144,'172.18.128.8',NULL,'/var/lib/libvirt/images/','2013-06-11
> 22:09:38',NULL,NULL,'Up',NULL,NULL),(219,'cn009','e6c4ed93-d0ee-429a-a44f-e39f7ece4356','Filesystem',0,1,1,1,5183913791488,5270660358144,'172.18.128.9',NULL,'/var/lib/libvirt/images/','2013-06-11
> 22:14:52',NULL,NULL,'Up',NULL,NULL),(220,'cn010','b8398363-b0d0-4768-870f-b50033baa5dc','Filesystem',0,1,1,1,5242997583872,5270660358144,'172.18.128.10',NULL,'/var/lib/libvirt/images/','2013-06-11
> 22:25:25',NULL,NULL,'Up',NULL,NULL),(221,'cn006','59340ae4-22be-46a6-94d0-f4e44ac74885','Filesystem',0,1,1,1,5251206721536,5270660358144,'172.18.128.6',NULL,'/var/lib/libvirt/images/','2013-06-11
> 22:45:09',NULL,NULL,'Up',NULL,NULL),(222,'cn011',NULL,'Filesystem',0,1,1,NULL,8122257408,8487899136,'172.18.128.11',NULL,'/var/lib/libvirt/images/','2013-06-19
> 03:09:37','2013-06-19
> 03:15:36',NULL,'Maintenance',NULL,NULL),(223,'cn011','ca666329-0081-48c1-837f-4181fdf60cfd','Filesystem',0,1,1,2,5229988343808,5270660358144,'172.18.128.11',NULL,'/var/lib/libvirt/images/','2013-06-20
> 07:25:39',NULL,NULL,'Up',NULL,NULL),(224,'cn012','60be4d38-8b57-491b-8d4c-cd2eb54fb815','Filesystem',0,1,1,2,5142698045440,5270660358144,'172.18.128.12',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:10:19',NULL,NULL,'Up',NULL,NULL),(225,'cn014','2e19dae5-79e2-4ec1-b280-5396fd695c22','Filesystem',0,1,1,2,5140740456448,5270660358144,'172.18.128.14',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:11:07',NULL,NULL,'Up',NULL,NULL),(226,'cn013','09528b9b-c5a9-4bd3-b9fe-fc31ff46afb2','Filesystem',0,1,1,2,5055306797056,5270660358144,'172.18.128.13',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:11:14',NULL,NULL,'Up',NULL,NULL),(227,'cn015','420c3008-8de7-4106-807a-eb2c86b4c261','Filesystem',0,1,1,2,5187185598464,5270660358144,'172.18.128.15',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:11:19',NULL,NULL,'Up',NULL,NULL),(228,'cn016','2cafc2d9-91da-405e-92c6-90b13cd8b068','Filesystem',0,1,1,2,5270461952000,5270660358144,'172.18.128.16',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:11:45',NULL,NULL,'Up',NULL,NULL),(229,'cn017','22dff242-f780-4522-95f5-c01ac62c197c','Filesystem',0,1,1,2,5039361929216,5270660358144,'172.18.128.17',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:12:00',NULL,NULL,'Up',NULL,NULL),(230,'cn018','31b5a0f2-0ea9-47a1-971c-4330539489c7','Filesystem',0,1,1,2,5014768701440,5270660358144,'172.18.128.18',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:12:22',NULL,NULL,'Up',NULL,NULL),(231,'cn019','a28eca04-09c0-4a42-b3a0-aa075fccb154','Filesystem',0,1,1,2,5270461812736,5270660358144,'172.18.128.19',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:17:30',NULL,NULL,'Up',NULL,NULL),(232,'cn020','dfc5d6e4-0f27-4692-8e94-1c89a9410e82','Filesystem',0,1,1,2,4790488539136,5270660358144,'172.18.128.20',NULL,'/var/lib/libvirt/images/','2013-06-20
> 08:17:51',NULL,NULL,'Up',NULL,NULL),(233,'cn061-10',NULL,'Filesystem',0,1,1,NULL,47272779776,48682147840,'172.18.128.61',NULL,'/var/lib/libvirt/images/','2013-07-01
> 15:11:22','2013-07-24
> 01:12:16',NULL,'Maintenance',NULL,NULL),(234,'cn061-10','c01a2cb9-239b-4d0b-b484-886065d888c2','Filesystem',0,1,1,3,5181433708544,5270660358144,'172.18.128.61',NULL,'/var/lib/libvirt/images/','2013-07-24
> 01:16:13',NULL,NULL,'Up',NULL,NULL),(235,'cn062-10','4c5f9b7f-968f-48be-a9c2-2ae2f11d8967','Filesystem',0,1,1,3,5030513332224,5270660358144,'172.18.128.62',NULL,'/var/lib/libvirt/images/','2013-07-24
> 01:46:40',NULL,NULL,'Up',NULL,NULL),(236,'cn063-10','c9a579ec-ed2f-41b4-b89e-c47bc346c4c3','Filesystem',0,1,1,3,4963781529600,5270660358144,'172.18.128.63',NULL,'/var/lib/libvirt/images/','2013-07-24
> 05:16:29',NULL,NULL,'Up',NULL,NULL),(237,'cn065-10','be4c89a9-8b9c-4161-8955-5db998c58e34','Filesystem',0,1,1,3,5029360099328,5270660358144,'172.18.128.65',NULL,'/var/lib/libvirt/images/','2013-07-24
> 05:35:43',NULL,NULL,'Up',NULL,NULL),(238,'cn064-10','180150d3-cf66-4156-acb9-9338e5294fbc','Filesystem',0,1,1,3,4882664796160,5270660358144,'172.18.128.64',NULL,'/var/lib/libvirt/images/','2013-07-24
> 05:37:31',NULL,NULL,'Up',NULL,NULL),(239,'cn067-10','63aa8d84-34c0-4f1e-a66a-247dba851da2','Filesystem',0,1,1,3,5182267789312,5270660358144,'172.18.128.67',NULL,'/var/lib/libvirt/images/','2013-07-24
> 05:46:12',NULL,NULL,'Up',NULL,NULL),(240,'cn066-10','b24c1265-3b3c-4aac-bebe-d689961af4bf','Filesystem',0,1,1,3,5207416717312,5270660358144,'172.18.128.66',NULL,'/var/lib/libvirt/images/','2013-07-24
> 05:48:58',NULL,NULL,'Up',NULL,NULL),(241,'cn068-10','d34ca2fe-2323-4f1d-bf49-282705e188ef','Filesystem',0,1,1,3,5159436877824,5270660358144,'172.18.128.68',NULL,'/var/lib/libvirt/images/','2013-07-24
> 05:59:22',NULL,NULL,'Up',NULL,NULL),(242,'cn069-10','5227b052-ec01-4fa8-afa1-27877f79818a','Filesystem',0,1,1,3,5111256465408,5270660358144,'172.18.128.69',NULL,'/var/lib/libvirt/images/','2013-07-24
> 06:01:52',NULL,NULL,'Up',NULL,NULL),(243,'cn070-10','c28fcfc0-c443-452d-959c-9fa5d01b57e4','Filesystem',0,1,1,3,4914289025024,5270660358144,'172.18.128.70',NULL,'/var/lib/libvirt/images/','2013-07-24
> 06:05:23',NULL,NULL,'Up',NULL,NULL),(244,'cn071-10','fe972842-d227-4eff-9730-3c4043842efb','Filesystem',0,1,1,4,5054019776512,5270660358144,'172.18.128.71',NULL,'/var/lib/libvirt/images/','2013-07-24
> 06:14:36',NULL,NULL,'Up',NULL,NULL),(245,'cn072-10','9dae6eff-6c2d-4091-88f1-682e23bc4424','Filesystem',0,1,1,4,5228991623168,5270660358144,'172.18.128.72',NULL,'/var/lib/libvirt/images/','2013-07-24
> 06:16:55',NULL,NULL,'Up',NULL,NULL),(246,'cn073-10','937f263b-1a14-488c-be5c-ba19e9a598aa','Filesystem',0,1,1,4,8107274240,8487899136,'172.18.128.73',NULL,'/var/lib/libvirt/images/','2013-09-17
> 06:55:01',NULL,NULL,'Up',NULL,NULL);
> /*!40000 ALTER TABLE `storage_pool` ENABLE KEYS */;
> 
> ======================
> 
> 
> On Tue, Sep 17, 2013 at 1:41 AM, Kirk Kosinski <kirkkosinski@gmail.com
> <mailto:kirkkosinski@gmail.com>> wrote:
> 
>     Hi, here is the error:
> 
>     2013-09-16 15:08:17,168 DEBUG [agent.transport.Request]
>     (AgentManager-Handler-5:null) Seq 13-931004532: Processing:  { Ans: ,
>     MgmtId: 161340856362, via: 13, Ver: v1, Flags: 110,
>     [{"storage.CreateAnswer":{"requestTemplateReload":false,"result":false,"details":"Exception:
>     com.cloud.utils.exception.CloudRuntimeException\nMessage:
>     org.libvirt.LibvirtException: Storage volume not found: no storage vol
>     with matching name 'f23a16e7-b628-429e-83e1-698935588465'\nStack:
>     com.cloud.utils.exception.CloudRuntimeException:
>     org.libvirt.LibvirtException: Storage volume not found: no storage vol
>     with matching name 'f23a16e7-b628-429e-83e1-698935588465'\n\tat
>     com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getVolume(LibvirtStorageAdaptor.java:90)\n\tat
>     com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getPhysicalDisk(LibvirtStorageAdaptor.java:437)\n\tat
>     com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.getPhysicalDisk(LibvirtStoragePool.java:123)\n\tat
>     com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:1279)\n\tat
>     com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1072)\n\tat
>     com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat
>     com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)\n\tat
>     com.cloud.utils.nio.Task.run(Task.java:83)\n\tat
>     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)\n\tat
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat
>     java.lang.Thread.run(Thread.java:679)\n","wait":0}}] }
> 
>     I'm not certain what volume it is complaining about, but I suspect
>     secondary storage.  Log on to a host (in particular host 13 [1] since it
>     is confirmed to suffer from the issue) and try to manually mount the
>     full path of the directory with the system VM template of the secondary
>     storage NFS share [2].  The idea is to confirm the share and
>     subdirectories of the share are mountable.  Maybe during the maintenance
>     some hosts changed IPs and/or the secondary storage NFS share
>     permissions (or other settings) were messed up.
> 
>     If the mount doesn't work, fix whatever is causing it.  If it does work,
>     please collect additional info.  Enable DEBUG logging on the hosts [3]
>     (if necessary), wait for the error to occur, and upload the agent.log
>     from the host with the error.  It should have more details besides the
>     exception shown in the management-server.log.  If you have a lot of
>     hosts and don't want to enable DEBUG logging on every one, temporarily
>     disable most of them and do it on the remaining few.
> 
>     Best regards,
>     Kirk
> 
>     [1] "13" is the id of the host in the CloudStack database, so find out
>     which host it is with:
>     select * from `cloud`.`host` where id = 13 \G
> 
>     [2] Something like:
>     nfshost:/share/template/tmpl/2/123
> 
>     [3] In /etc/cloudstack/agent/log4j-cloud.xml, set the Threshold for FILE
>     and com.cloud to DEBUG.  Depending on the CloudStack version, it may or
>     may not be enabled by default, and the path may be /etc/cloud/agent/.
> 
> 
>     On 09/16/2013 07:36 PM, sriharsha work wrote:
>     > Replying on behalf of Matt. We are able to write data to the Nfs
>     drives.
>     > That's not an issue.
>     >
>     > Thanks
>     > Sriharsha
>     >
>     > Sent from my iPhone
>     >
>     >> On Sep 16, 2013, at 19:30, Ahmad Emneina <aemneina@gmail.com
>     <mailto:aemneina@gmail.com>> wrote:
>     >>
>     >> Try to mount your primary storage to a compute host and try to
>     write to it.
>     >> Your NFS server might not have come back up properly
>     (settings-wise or all
>     >> the relevant services).
>     >>> On Sep 16, 2013 6:08 PM, "Matt Foley" <mfoley@hortonworks.com
>     <mailto:mfoley@hortonworks.com>> wrote:
>     >>>
>     >>> Thank you Chiradeep.  Log snippet now available as
>     http://apaste.info/qBIB
>     >>> --Matt
>     >>>
>     >>> On Mon, Sep 16, 2013 at 5:19 PM, Chiradeep Vittal <
>     >>> Chiradeep.Vittal@citrix.com
>     <mailto:Chiradeep.Vittal@citrix.com>> wrote:
>     >>>
>     >>>> Attachments are stripped. Can you paste (say at
>     http://apaste.info/)
>     >>>>
>     >>>> From: Matt Foley <mfoley@hortonworks.com
>     <mailto:mfoley@hortonworks.com>>
>     >>>> Date: Monday, September 16, 2013 4:58 PM
>     >>>>
>     >>>> We had a planned network outage this weekend, which inadvertently
>     >>> resulted
>     >>>> in making the NFS Shared Primary Storage (used by System VMs)
>     unavailable
>     >>>> for a day and a half.  (Guest VMs use local storage only, but
>     System VMs
>     >>>> use shared storage only.)  Cloudstack was not brought down
>     prior to the
>     >>>> outage.
>     >>>>
>     >>>> After network came back, we gracefully brought down all services
>     >>> including
>     >>>> cloudstack-management, mysql, and NFS, then actually rebooted
>     all servers
>     >>>> in the cluster and the NFS server (to make sure no stale file
>     handles),
>     >>>> then brought up services in the appropriate order.  Also
>     checked mysql
>     >>> for
>     >>>> table corruption, and found none.  Confirmed that the NFS
>     volumes are
>     >>>> mountable from all hosts, and in fact Shared Primary Storage is
>     being
>     >>>> mounted by cloudstack on hosts as usual, under /mnt/<uuid>.
>     >>>>
>     >>>> Nevertheless, when try to bring up the cluster, we fail to
>     start the
>     >>>> system VMs, with errors "InsufficientServerCapacityException:
>     Unable to
>     >>>> create a deployment for VM".  The cause is not really insufficient
>     >>>> capacity, as actual usage of resources is tiny; these error
>     messages are
>     >>>> false explanations of the failure to create primary storage
>     volume for
>     >>> the
>     >>>> System VMs.
>     >>>>
>     >>>> Digging into management-server.log, the core issue seems to be
>     the ~160
>     >>>> line snippet from the log attached to this message as
>     >>>> cloudstack_debug_2013.09.16.log. The only Shared Primary
>     Storage pool is
>     >>>> pool 201, named "cs-primary".  It is mounted on all hosts as
>     >>>> /mnt/9c6fd9a3-43e5-389a-9594-faecf178b4b9, which is its uuid.
>      The log
>     >>>> shows the management server correctly identifying a particular
>     host as
>     >>>> being able to access pool 201, then trying to allocate a
>     primary storage
>     >>>> volume using the template with uuid
>     f23a16e7-b628-429e-83e1-698935588465.
>     >>>> It fails, but I cannot tell why.  I suspect its claim that
>     "Template 3
>     >>> has
>     >>>> already been downloaded to pool 201" is false, but I don't know
>     how to
>     >>>> check this (or fix if wrong).
>     >>>>
>     >>>> Any guidance for further debugging or fixing this would be GREATLY
>     >>>> appreciated.
>     >>>> Thanks,
>     >>>> --Matt
>     >>>
>     >>> --
>     >>> CONFIDENTIALITY NOTICE
>     >>> NOTICE: This message is intended for the use of the individual
>     or entity to
>     >>> which it is addressed and may contain information that is
>     confidential,
>     >>> privileged and exempt from disclosure under applicable law. If
>     the reader
>     >>> of this message is not the intended recipient, you are hereby
>     notified that
>     >>> any printing, copying, dissemination, distribution, disclosure or
>     >>> forwarding of this communication is strictly prohibited. If you have
>     >>> received this communication in error, please contact the sender
>     immediately
>     >>> and delete it from your system. Thank You.
>     >>>
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is
> confidential, privileged and exempt from disclosure under applicable
> law. If the reader of this message is not the intended recipient, you
> are hereby notified that any printing, copying, dissemination,
> distribution, disclosure or forwarding of this communication is strictly
> prohibited. If you have received this communication in error, please
> contact the sender immediately and delete it from your system. Thank You.

Mime
View raw message