cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nik Martin <nik.mar...@nfinausa.com>
Subject Re: Storage failure in not handled well in CS
Date Wed, 03 Oct 2012 13:51:33 GMT
Bump?  This is a serious issue that I need to get resolved.  An entire 
cloud going down while one SAN is being repaired is a bad thing.  My 
cloud controller still refuses to start VMs because it cannot connect to 
a SAN that is in maintenance mode and is offline.


On 10/02/2012 03:12 PM, Nik Martin wrote:
> I have two SANs connected to CS as primary storage.  One is an HD based
> SAN, with a single target and LUN, and the other is an SSD SAN split
> into two volumes, each connected with a target and LUN.  The HD san is
> where all system VMs are stored (or they were before I added the HD SAN,
> but I have no ide where the system vm volumens are stored).  This
> morning, I had to do a semi emergency shutdown of the SSD SAN, so I put
> both LUNS in emergency maintenance mode in CS.  CS shutdown the entire
> cloud, not just the volumes stored in the SSD san.  The san is offline,
> and CS shows it in maintenance mode, but NO vm's will start, and the cs
> management log shows:
>
> onnecting; event = AgentDisconnected; new status = Alert; old update
> count = 959; new update count = 960]
> 2012-10-02 15:10:40,370 DEBUG [agent.manager.ClusteredAgentManagerImpl]
> (AgentTaskPool-2:null) Notifying other nodes of to disconnect
> 2012-10-02 15:10:40,370 WARN  [cloud.resource.ResourceManagerImpl]
> (AgentTaskPool-2:null) Unable to connect due to
> com.cloud.exception.ConnectionException: Unable to connect to pool
> Pool[204|IscsiLUN]
>      at
>      at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>
>      at java.lang.Thread.run(Thread.java:679)
> Caused by: com.cloud.exception.StorageUnavailableException: Resource
> [StoragePool:204] is unreachable: Unable establish connection from
> storage head to storage pool 204 due to ModifyStoragePoolCommand add
> XenAPIException:Can not see storage pool:
> cfd3b016-d4d9-3bb9-b1f9-f31374c44185 from on
> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd
> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
> 172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/0
>      at
> com.cloud.storage.StorageManagerImpl.connectHostToSharedPool(StorageManagerImpl.java:1567)
>
>      at
> com.cloud.storage.listener.StoragePoolMonitor.processConnect(StoragePoolMonitor.java:88)
>
>      ... 8 more
> 2012-10-02 15:10:40,371 DEBUG [cloud.host.Status] (AgentTaskPool-2:null)
> Transition:[Resource state = Enabled, Agent event = AgentDisconnected,
> Host id = 6, name = hv1]
> 2012-10-02 15:10:40,375 DEBUG [cloud.host.Status] (AgentTaskPool-2:null)
> Agent status update: [id = 6; name = hv1; old status = Alert; event =
> AgentDisconnected; new status = Alert; old update count = 960; new
> update count = 961]
>
>
> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
> 172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/1 is the SAN that is
> in maintenance mode, so why is CS still trying to connect?  All my HVs
> are in alert state becasue of this.
>


-- 
Regards,

Nik

Nik Martin
VP Business Development
Nfina Technologies, Inc.
+1.251.243.0043 x1003
Relentless Reliability

Mime
View raw message