cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmad Emneina <Ahmad.Emne...@citrix.com>
Subject Re: Storage failure in not handled well in CS
Date Wed, 03 Oct 2012 17:03:43 GMT
Hey Nik,

It appears the compute host, or cluster, cant connect to the SAN
referenced below. Have you peered into the compute hosts logs, they should
be more informative as to why it cant connect the storage. You should also
have at least one storage pool up to be able to provision against.

On 10/3/12 6:51 AM, "Nik Martin" <nik.martin@nfinausa.com> wrote:

>Bump?  This is a serious issue that I need to get resolved.  An entire
>cloud going down while one SAN is being repaired is a bad thing.  My
>cloud controller still refuses to start VMs because it cannot connect to
>a SAN that is in maintenance mode and is offline.
>
>
>On 10/02/2012 03:12 PM, Nik Martin wrote:
>> I have two SANs connected to CS as primary storage.  One is an HD based
>> SAN, with a single target and LUN, and the other is an SSD SAN split
>> into two volumes, each connected with a target and LUN.  The HD san is
>> where all system VMs are stored (or they were before I added the HD SAN,
>> but I have no ide where the system vm volumens are stored).  This
>> morning, I had to do a semi emergency shutdown of the SSD SAN, so I put
>> both LUNS in emergency maintenance mode in CS.  CS shutdown the entire
>> cloud, not just the volumes stored in the SSD san.  The san is offline,
>> and CS shows it in maintenance mode, but NO vm's will start, and the cs
>> management log shows:
>>
>> onnecting; event = AgentDisconnected; new status = Alert; old update
>> count = 959; new update count = 960]
>> 2012-10-02 15:10:40,370 DEBUG [agent.manager.ClusteredAgentManagerImpl]
>> (AgentTaskPool-2:null) Notifying other nodes of to disconnect
>> 2012-10-02 15:10:40,370 WARN  [cloud.resource.ResourceManagerImpl]
>> (AgentTaskPool-2:null) Unable to connect due to
>> com.cloud.exception.ConnectionException: Unable to connect to pool
>> Pool[204|IscsiLUN]
>>      at
>>      at
>> 
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>a:603)
>>
>>      at java.lang.Thread.run(Thread.java:679)
>> Caused by: com.cloud.exception.StorageUnavailableException: Resource
>> [StoragePool:204] is unreachable: Unable establish connection from
>> storage head to storage pool 204 due to ModifyStoragePoolCommand add
>> XenAPIException:Can not see storage pool:
>> cfd3b016-d4d9-3bb9-b1f9-f31374c44185 from on
>> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd
>> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
>> 172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/0
>>      at
>> 
>>com.cloud.storage.StorageManagerImpl.connectHostToSharedPool(StorageManag
>>erImpl.java:1567)
>>
>>      at
>> 
>>com.cloud.storage.listener.StoragePoolMonitor.processConnect(StoragePoolM
>>onitor.java:88)
>>
>>      ... 8 more
>> 2012-10-02 15:10:40,371 DEBUG [cloud.host.Status] (AgentTaskPool-2:null)
>> Transition:[Resource state = Enabled, Agent event = AgentDisconnected,
>> Host id = 6, name = hv1]
>> 2012-10-02 15:10:40,375 DEBUG [cloud.host.Status] (AgentTaskPool-2:null)
>> Agent status update: [id = 6; name = hv1; old status = Alert; event =
>> AgentDisconnected; new status = Alert; old update count = 960; new
>> update count = 961]
>>
>>
>> host:82cad07f-6fbc-464e-86fe-28bb4af4bbcd pool:
>> 172.16.10.15/iqn.2012-01:com.nfinausa.san2:mirror0/1 is the SAN that is
>> in maintenance mode, so why is CS still trying to connect?  All my HVs
>> are in alert state becasue of this.
>>
>
>
>-- 
>Regards,
>
>Nik
>
>Nik Martin
>VP Business Development
>Nfina Technologies, Inc.
>+1.251.243.0043 x1003
>Relentless Reliability
>


-- 
Æ




Mime
View raw message