cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Xu <Xuefei...@citrix.com>
Subject RE: Storage failure in not handled well in CS
Date Wed, 03 Oct 2012 19:11:18 GMT
It is a bug, please file a bug,

You can try following workaround,
In mysql
Update storage_pool set removed=now() where id= "primary storage id you put into maintenance
mode"


Anthony


-----Original Message-----
From: Nik Martin [mailto:nik.martin@nfinausa.com] 
Sent: Wednesday, October 03, 2012 6:52 AM
To: cloudstack-users@incubator.apache.org
Subject: Re: Storage failure in not handled well in CS

Bump?  This is a serious issue that I need to get resolved.  An entire cloud going down while
one SAN is being repaired is a bad thing.  My cloud controller still refuses to start VMs
because it cannot connect to a SAN that is in maintenance mode and is offline.


On 10/02/2012 03:12 PM, Nik Martin wrote:
> I have two SANs connected to CS as primary storage.  One is an HD 
> based SAN, with a single target and LUN, and the other is an SSD SAN 
> split into two volumes, each connected with a target and LUN.  The HD 
> san is where all system VMs are stored (or they were before I added 
> the HD SAN, but I have no ide where the system vm volumens are 
> stored).  This morning, I had to do a semi emergency shutdown of the 
> SSD SAN, so I put both LUNS in emergency maintenance mode in CS.  CS 
> shutdown the entire cloud, not just the volumes stored in the SSD san.  
> The san is offline, and CS shows it in maintenance mode, but NO vm's 
> will start, and the cs management log shows:
>
> onnecting; event = AgentDisconnected; new status = Alert; old update 
> count = 959; new update count = 960]
> 2012-10-02 15:10:40,370 DEBUG 
> [agent.manager.ClusteredAgentManagerImpl]
> (AgentTaskPool-2:null) Notifying other nodes of to disconnect
> 2012-10-02 15:10:40,370 WARN  [cloud.resource.ResourceManagerImpl]
> (AgentTaskPool-2:null) Unable to connect due to
> com.cloud.exception.ConnectionException: Unable to connect to pool 
> Pool[204|IscsiLUN]
>      at
>      at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:603)
>
>      at java.lang.Thread.run(Thread.java:679)
> Caused by: com.cloud.exception.StorageUnavailableException: Resource 
> [StoragePool:204] is unreachable: Unable establish connection from 
> storage head to storage pool 204 due to ModifyStoragePoolCommand add 
> XenAPIException:Can not see storage pool:
> cfd3b016-d4d9-3bb9-b1f9-f31374c44185 from on 
> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd
> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
> 172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/0
>      at
> com.cloud.storage.StorageManagerImpl.connectHostToSharedPool(StorageMa
> nagerImpl.java:1567)
>
>      at
> com.cloud.storage.listener.StoragePoolMonitor.processConnect(StoragePo
> olMonitor.java:88)
>
>      ... 8 more
> 2012-10-02 15:10:40,371 DEBUG [cloud.host.Status] 
> (AgentTaskPool-2:null) Transition:[Resource state = Enabled, Agent 
> event = AgentDisconnected, Host id = 6, name = hv1]
> 2012-10-02 15:10:40,375 DEBUG [cloud.host.Status] 
> (AgentTaskPool-2:null) Agent status update: [id = 6; name = hv1; old 
> status = Alert; event = AgentDisconnected; new status = Alert; old 
> update count = 960; new update count = 961]
>
>
> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
> 172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/1 is the SAN that 
> is in maintenance mode, so why is CS still trying to connect?  All my 
> HVs are in alert state becasue of this.
>


--
Regards,

Nik

Nik Martin
VP Business Development
Nfina Technologies, Inc.
+1.251.243.0043 x1003
Relentless Reliability
Mime
View raw message