cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya musayev <ilya.mailing.li...@gmail.com>
Subject Re: Management Server won't connect after cluster shutdown and restart
Date Sat, 30 Aug 2014 18:12:06 GMT
Can you tell us more please.

In my rather large environments, I may need to do several restarts for 
cloudstack to come up properly.

Otherwise it complains that SSVM and CPVM are not ready to launch in Zone X.

Thanks
ilya
On 8/30/14, 5:29 AM, Ian Duffy wrote:
> Hi All,
>
> Thank you very much for the help.
>
> Ended up solving the issue. There was an invalid value in our configuration
> table which seemed to prevent a lot of DAOs from being autowired.
>
>
>
>
> On 29 August 2014 21:16, Paul Angus <paul.angus@shapeblue.com> wrote:
>
>> Hi Ian,
>>
>> I've seen this kind of behaviour before with KVM hosts reconnecting.
>>
>> There’s a select …. WITH UPDATE; query on the op_ha_work table which locks
>> the table, stopping other hosts updating their status. If there are a lot
>> of entries in there they all lock each other out. Deleting the entries
>> fixed the problem, but you have to deal with hosts and vms being up/down
>> yourself.
>>
>> So check the op_ha_work table for lots of entries which can lock up the
>> database. If you can check the database for the queries that it's handling
>> - that would be best.
>>
>> Also check that the management server and MySQL DB is tuned for the load
>> that being thrown at it.
>> (http://support.citrix.com/article/CTX132020)
>> Remember if you have other services such as Nagios or puppet/chef directly
>> reading the DB, that adds to the number of connections into the mysql db -
>> I have seen the management server starved of mysql connections when a lot
>> of hosts are brought back online.
>>
>>
>> Regards
>>
>> Paul Angus
>> Cloud Architect
>> S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
>> paul.angus@shapeblue.com
>>
>> -----Original Message-----
>> From: creategui@gmail.com [mailto:creategui@gmail.com] On Behalf Of
>> Carlos Reategui
>> Sent: 29 August 2014 20:55
>> To: users@cloudstack.apache.org
>> Subject: Re: Management Server won't connect after cluster shutdown and
>> restart
>>
>> Hi Ian,
>>
>> So the root of the problem was that the machines where not started up in
>> the correct order.
>>
>> My plan had been to stop all VMs from CS, then stop CS, then shutdown the
>> VM hosts.  On the other end the hosts needed to be brought up first and
>> once they are ok then bring up the CS machine and make sure everything was
>> in the same state it thought things were when it was shutdown.
>>   Unfortunately CS came up before everything else was the way it expected
>> it to be and I did not realize that at the time.
>>
>> To resolve I went back to my CS db backup from right after I shut it down
>> the MS, made sure the VM hosts were all as expected and then started the MS.
>>
>>
>>
>>
>>
>>
>> On Fri, Aug 29, 2014 at 8:02 AM, Ian Duffy <ian@ianduffy.ie> wrote:
>>
>>> Hi carlos,
>>>
>>> Did you ever find a fix for this?
>>>
>>> I'm seeing a same issue on 4.1.1 with Vmware ESXi.
>>>
>>>
>>> On 29 October 2013 04:54, Carlos Reategui <creategui@gmail.com> wrote:
>>>
>>>> Update.  I cleared out the async_job table and also reset the system
>>>> vms
>>> it
>>>> thought where in starting mode from my previous attempts by setting
>>>> them
>>> to
>>>> Stopped from starting.  I also re-set the XS pool master to be the
>>>> one XS thinks it is.
>>>>
>>>> Now when I start the CS MS here are the logs leading up to the first
>>>> exception about the Unable to reach the pool:
>>>>
>>>> 2013-10-28 21:27:11,040 DEBUG [cloud.alert.ClusterAlertAdapter]
>>>> (Cluster-Notification-1:null) Management server node 172.30.45.2 is
>>>> up, send alert
>>>>
>>>> 2013-10-28 21:27:11,045 WARN  [cloud.cluster.ClusterManagerImpl]
>>>> (Cluster-Notification-1:null) Notifying management server join event
>>> took 9
>>>> ms
>>>>
>>>> 2013-10-28 21:27:23,236 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-2:null) HostStatsCollector is running...
>>>>
>>>> 2013-10-28 21:27:23,243 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-3:null) VmStatsCollector is running...
>>>>
>>>> 2013-10-28 21:27:23,247 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) StorageCollector is running...
>>>>
>>>> 2013-10-28 21:27:23,255 DEBUG [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) There is no secondary storage VM for
>>>> secondary storage host nfs://172.30.45.2/store/secondary
>>>>
>>>> 2013-10-28 21:27:23,273 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Forwarding null to
>>> 233845174730255
>>>> 2013-10-28 21:27:23,274 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-9:null) Seq 1-201916421: Routing from
>>> 233845174730253
>>>> 2013-10-28 21:27:23,275 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-9:null) Seq 1-201916421: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,275 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-9:null) Seq 1-201916421: MgmtId 233845174730253:
>>> Req:
>>>> Resource [Host:1] is unreachable: Host 1: Link is c
>>>>
>>>> losed
>>>>
>>>> 2013-10-28 21:27:23,275 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-9:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Cancel request received
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-11:null) Seq 1-201916421: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,277 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,277 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-2:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,277 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,278 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 1-201916421: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,278 WARN  [agent.manager.AgentManagerImpl]
>>>> (StatsCollector-2:null) Operation timed out: Commands 201916421 to
>>>> Host 1 timed out after 3600
>>>>
>>>> 2013-10-28 21:27:23,278 WARN  [cloud.resource.ResourceManagerImpl]
>>>> (StatsCollector-2:null) Unable to obtain host 1 statistics.
>>>>
>>>> 2013-10-28 21:27:23,278 WARN  [cloud.server.StatsCollector]
>>>> (StatsCollector-2:null) Received invalid host stats for host: 1
>>>>
>>>> 2013-10-28 21:27:23,281 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Forwarding null to
>>> 233845174730255
>>>> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-12:null) Seq 1-201916422: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,283 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-12:null) Seq 1-201916422: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,283 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-12:null) Seq 1-201916422: MgmtId 233845174730253:
>>>> Req: Resource [Host:1] is unreachable: Host 1: Link is
>>>>
>>>> closed
>>>>
>>>> 2013-10-28 21:27:23,284 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-12:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-13:null) Seq 1--1: MgmtId 233845174730253: Req:
>>>> Cancel request received
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-13:null) Seq 1-201916422: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,286 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-1:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,286 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,286 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 1-201916422: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,288 DEBUG [cloud.storage.StorageManagerImpl]
>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>> Pool[200|LVM] via 1
>>>>
>>>> com.cloud.exception.OperationTimedoutException: Commands 201916422
>>>> to
>>> Host
>>>> 1 timed out after 3600
>>>>
>>>>          at
>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 511)
>>>>
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 464)
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,289 INFO  [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) Unable to reach Pool[200|LVM]
>>>>
>>>> com.cloud.exception.StorageUnavailableException: Resource
>>> [StoragePool:200]
>>>> is unreachable: Unable to send command to the pool
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2357)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,300 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Forwarding null to
>>>> 233845174730255
>>>>
>>>> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,301 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,302 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-14:null) Seq 2-1168703496: MgmtId
>> 233845174730253:
>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,302 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-14:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,303 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-15:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Cancel request received
>>>>
>>>> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-15:null) Seq 2-1168703496: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,303 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,304 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-2:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,304 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-2:null) Seq 2-1168703496: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [agent.manager.AgentManagerImpl]
>>>> (StatsCollector-2:null) Operation timed out: Commands 1168703496 to
>>>> Host
>>> 2
>>>> timed out after 3600
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [cloud.resource.ResourceManagerImpl]
>>>> (StatsCollector-2:null) Unable to obtain host 2 statistics.
>>>>
>>>> 2013-10-28 21:27:23,304 WARN  [cloud.server.StatsCollector]
>>>> (StatsCollector-2:null) Received invalid host stats for host: 2
>>>>
>>>> 2013-10-28 21:27:23,307 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Forwarding null to
>>>> 233845174730255
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-1:null) Seq 2-1168703497: MgmtId 233845174730253:
>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,308 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-1:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-5:null) Seq 2--1: MgmtId 233845174730253: Req:
>>> Cancel
>>>> request received
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-5:null) Seq 2-1168703497: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,310 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-1:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,310 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703497: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,310 DEBUG [cloud.storage.StorageManagerImpl]
>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>> Pool[201|LVM] via 2
>>>>
>>>> com.cloud.exception.OperationTimedoutException: Commands 1168703497
>>>> to
>>> Host
>>>> 2 timed out after 3600
>>>>
>>>>          at
>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 511)
>>>>
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 464)
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,311 INFO  [cloud.server.StatsCollector]
>>>> (StatsCollector-1:null) Unable to reach Pool[201|LVM]
>>>>
>>>> com.cloud.exception.StorageUnavailableException: Resource
>>> [StoragePool:201]
>>>> is unreachable: Unable to send command to the pool
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2357)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> 2013-10-28 21:27:23,328 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Forwarding null to
>>>> 233845174730255
>>>>
>>>> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: Routing from
>>>> 233845174730253
>>>>
>>>> 2013-10-28 21:27:23,329 DEBUG [agent.manager.ClusteredAgentAttache]
>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,329 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-3:null) Seq 2-1168703498: MgmtId 233845174730253:
>>>> Req: Resource [Host:2] is unreachable: Host 2: Link is closed
>>>>
>>>> 2013-10-28 21:27:23,330 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-3:null) Seq 2--1: MgmtId 233845174730253: Req:
>>>> Routing to peer
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG
>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>> (AgentManager-Handler-4:null) Seq 2--1: MgmtId 233845174730253: Req:
>>> Cancel
>>>> request received
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>> (AgentManager-Handler-4:null) Seq 2-1168703498: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,331 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Waiting some more time
>>>> because this is the current command
>>>>
>>>> 2013-10-28 21:27:23,331 INFO  [utils.exception.CSExceptionErrorCode]
>>>> (StatsCollector-1:null) Could not find exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list
>>>> for exceptions
>>>>
>>>> 2013-10-28 21:27:23,332 WARN  [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Timed out on null
>>>>
>>>> 2013-10-28 21:27:23,332 DEBUG [agent.manager.AgentAttache]
>>>> (StatsCollector-1:null) Seq 2-1168703498: Cancelling.
>>>>
>>>> 2013-10-28 21:27:23,332 DEBUG [cloud.storage.StorageManagerImpl]
>>>> (StatsCollector-1:null) Unable to send storage pool command to
>>>> Pool[202|NetworkFilesystem] via 2
>>>>
>>>> com.cloud.exception.OperationTimedoutException: Commands 1168703498
>>>> to
>>> Host
>>>> 2 timed out after 3600
>>>>
>>>>          at
>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 511)
>>>>
>>>>          at
>>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:
>>>> 464)
>>>>
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>          at
>>>>
>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>          at
>>>>
>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 71)
>>>>
>>>>          at
>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>          at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>          at
>>>>
>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> IP tables is disable on the XS hosts so the connection prob is not a
>>>> firewall issue.
>>>>
>>>> If I do an xe se-list I see all 3 of the above SRs and the hosts
>>>> have mounted the NFS SR and can access it.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 28, 2013 at 9:05 PM, Carlos Reategui
>>>> <carlos@reategui.com
>>>>> wrote:
>>>>> Using CS 4.1.1 with 2 hosts running XS 6.0.2
>>>>>
>>>>> Had to shut everything down and now I am having problems bringing
>>> things
>>>>> up.
>>>>>
>>>>> As suggested I used CS to stop all my instances as well as the
>>>>> system
>>> VMs
>>>>> and the SR. Then I shutdown the XS 6.02 servers after enabling
>>>> maintenance
>>>>> mode from the CS console.
>>>>>
>>>>> After bringing things up, my XS servers had the infamous
>>> interface-rename
>>>>> issue which I resolved by editing the udev rules file manually.
>>>>>
>>>>> Now I have my XS servers up but for some reason my pool master got
>>>> changed
>>>>> so I used xe pool-designate-new-master to switch it back.
>>>>>
>>>>> I did not notice that this designation change had been picked up
>>>>> by CS
>>>> and
>>>>> when starting it up it keeps trying to connect to the wrong pool
>>> master.
>>>>>   Should I switch XS to match CS or what do I need to change in CS
>>>>> to
>>> tell
>>>>> it what the pool master is?
>>>>>
>>>>> I tried putting the server that CS thinks is the master in
>>>>> maintenance mode from CS but that just ends up in an apparent
>>>>> infinite cycle
>>> spitting
>>>>> out endless lines like these:
>>>>>
>>>>> 2013-10-28 20:39:02,059 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-1:null) Seq 2-855048230: Forwarding Seq
>>>> 2-855048230:
>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>
>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,060 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-11:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,062 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-13:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,063 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-7:null) Seq 2-855048230: Forwarding Seq
>>>> 2-855048230:
>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>
>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,064 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-15:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,066 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-14:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,067 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-2:null) Seq 2-855048230: Forwarding Seq
>>>> 2-855048230:
>>>>> { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1, Flags
>>>>>
>>>>> : 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> 2013-10-28 20:39:02,068 DEBUG
>>>>> [agent.manager.ClusteredAgentAttache]
>>>>> (AgentManager-Handler-12:null) Seq 2-855048230: Forwarding Seq
>>>>> 2-855048230:  { Cmd , MgmtId: 233845174730253, via: 2, Ver: v1,
>>>>> Flag
>>>>>
>>>>> s: 100111, [{"MaintainCommand":{"wait":0}}] } to 233845174730255
>>>>>
>>>>> After stopping and restarting the MS, the first error I see is:
>>>>>
>>>>> 2013-10-28 20:41:53,749 DEBUG [cloud.api.ApiServlet]
>>>>> (catalina-exec-1:null) ===START===  10.110.3.70 -- GET
>>>>>
>>> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
>>> &response=json&sessi
>>>>> onkey=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>>>>>
>>>>> 2013-10-28 20:41:53,756 ERROR [cloud.api.ApiServlet]
>>>>> (catalina-exec-1:null) unknown exception writing api response
>>>>>
>>>>> java.lang.NullPointerException
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
>>> a:280)
>>>>>          at
>>>>>
>>> com.cloud.user.AccountManagerImpl.getSystemUser(AccountManagerImpl.jav
>>> a:143)
>>>>>          at
>>>>> com.cloud.api.ApiServlet.processRequest(ApiServlet.java:238)
>>>>>
>>>>>          at com.cloud.api.ApiServlet.doGet(ApiServlet.java:66)
>>>>>
>>>>>          at
>>>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>>>>
>>>>>          at
>>>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>>>>
>>>>>          at
>>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
>>> cationFilterChain.java:290)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
>>> lterChain.java:206)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
>>> lve.java:233)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
>>> lve.java:191)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
>>> va:127)
>>>>>          at
>>>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
>>> va:102)
>>>>>          at
>>>>>
>>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:6
>>> 15)
>>>>>          at
>>>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
>>> e.java:109)
>>>>>          at
>>>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
>>> :293)
>>>>>          at
>>>>>
>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor
>>> .java:889)
>>>>>          at
>>>>>
>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.pro
>>> cess(Http11NioProtocol.java:744)
>>>>>          at
>>>>>
>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint
>>> .java:2282)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 20:41:53,761 DEBUG [cloud.api.ApiServlet]
>>>>> (catalina-exec-1:null) ===END===  10.110.3.70 -- GET
>>>>>
>>> command=queryAsyncJobResult&jobId=d695b8ba-53b5-4e22-8e97-54e5ed236f88
>>> &response=json&session
>>>>> key=r4nsNGoidS8enQWHRKbV2AUNeac%3D&_=1383018110624
>>>>>
>>>>> Then I see a few of these:
>>>>>
>>>>> 2013-10-28 20:42:01,464 WARN
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (HA-Worker-4:work-10) Unable to connect to peer management server:
>>>>> 233845174730255, ip: 172.30.45.2 due to Connection refused
>>>>>
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>>          at sun.nio.ch.Net.connect(Native Method)
>>>>>
>>>>>          at
>>>> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>>>>>          at
>>>>> java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
>>> redAgentManagerImpl.java:477)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
>>> he.java:172)
>>>>>          at
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
>>> 1)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
>>> 4)
>>>>>          at
>>>>>
>>> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
>>> or.java:53)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
>>> erImpl.java:434)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
>>> lityManagerImpl.java:829)
>>>>> 2013-10-28 20:42:01,468 WARN
>>>>> [agent.manager.ClusteredAgentManagerImpl]
>>>>> (HA-Worker-2:work-11) Unable to connect to peer management server:
>>>>> 233845174730255, ip: 172.30.45.2 due to Connection refused
>>>>>
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>>          at sun.nio.ch.Net.connect(Native Method)
>>>>>
>>>>>          at
>>>> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:530)
>>>>>          at
>>>>> java.nio.channels.SocketChannel.open(SocketChannel.java:164)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(Cluste
>>> redAgentManagerImpl.java:477)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttac
>>> he.java:172)
>>>>>          at
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:388)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
>>> 1)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
>>> 4)
>>>>>          at
>>>>>
>>> com.cloud.ha.CheckOnAgentInvestigator.isVmAlive(CheckOnAgentInvestigat
>>> or.java:53)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManag
>>> erImpl.java:434)
>>>>>          at
>>>>>
>>> com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabi
>>> lityManagerImpl.java:829)
>>>>>
>>>>> The next error is:
>>>>>
>>>>> 2013-10-28 20:42:01,845 WARN  [utils.nio.Task]
>>>>> (AgentManager-Handler-6:null) Caught the following exception but
>>> pushing
>>>> on
>>>>> java.lang.NullPointerException
>>>>>
>>>>>          at
>>>>>
>>> com.google.gson.FieldAttributes.getAnnotationFromArray(FieldAttributes
>>> .java:231)
>>>>>          at
>>>>> com.google.gson.FieldAttributes.getAnnotation(FieldAttributes.java
>>>>> :150)
>>>>>
>>>>>          at
>>>>>
>>> com.google.gson.VersionExclusionStrategy.shouldSkipField(VersionExclus
>>> ionStrategy.java:38)
>>>>>          at
>>>>>
>>> com.google.gson.DisjunctionExclusionStrategy.shouldSkipField(Disjuncti
>>> onExclusionStrategy.java:38)
>>>>>          at
>>>>>
>>> com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(Refle
>>> ctingFieldNavigator.java:58)
>>>>>          at
>>>> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:62)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:53)
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:220)
>>>>>
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:197)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
>>> java:56)
>>>>>          at
>>>>>
>>> com.cloud.agent.transport.ArrayTypeAdaptor.serialize(ArrayTypeAdaptor.
>>> java:37)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationVisitor.findAndInvokeCustomSerializer
>>> (JsonSerializationVisitor.java:184)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonS
>>> erializationVisitor.java:160)
>>>>>          at
>>>> com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:101)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:62)
>>>>>          at
>>>>>
>>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializ
>>> ationContextDefault.java:53)
>>>>>          at com.google.gson.Gson.toJsonTree(Gson.java:220)
>>>>>
>>>>>          at com.google.gson.Gson.toJson(Gson.java:260)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.transport.Request.toBytes(Request.java:316)
>>>>>
>>>>>          at
>>>>> com.cloud.agent.transport.Request.getBytes(Request.java:332)
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl.cancel(ClusteredAgen
>>> tManagerImpl.java:435)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.ClusteredAgentManagerImpl$ClusteredAgentHandle
>>> r.doTask(ClusteredAgentManagerImpl.java:641)
>>>>>          at com.cloud.utils.nio.Task.run(Task.java:83)
>>>>>
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> and then the next set of errors I see over and over are:
>>>>>
>>>>> 2013-10-28 20:42:16,433 DEBUG [cloud.storage.StorageManagerImpl]
>>>>> (StatsCollector-2:null) Unable to send storage pool command to
>>>>> Pool[200|LVM] via 1
>>>>>
>>>>> com.cloud.exception.OperationTimedoutException: Commands
>>>>> 1112277002 to Host 1 timed out after 3600
>>>>>
>>>>>          at
>>>> com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:429)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:51
>>> 1)
>>>>>          at
>>>>>
>>> com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:46
>>> 4)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2347)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>>          at
>>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
>>>>> :471)
>>>>>
>>>>>          at
>>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>>          at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> 2013-10-28 20:42:16,434 INFO  [cloud.server.StatsCollector]
>>>>> (StatsCollector-2:null) Unable to reach Pool[200|LVM]
>>>>>
>>>>> com.cloud.exception.StorageUnavailableException: Resource
>>>>> [StoragePool:200] is unreachable: Unable to send command to the
>>>>> pool
>>>>>
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:2357)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:422)
>>>>>          at
>>>>>
>>> com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.jav
>>> a:436)
>>>>>          at
>>>>>
>>> com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.ja
>>> va:316)
>>>>>          at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java
>>>>> :471)
>>>>>
>>>>>          at
>>>>>
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:
>>> 351)
>>>>>          at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.a
>>> ccess$201(ScheduledThreadPoolExecutor.java:165)
>>>>>          at
>>>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.r
>>> un(ScheduledThreadPoolExecutor.java:267)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
>>> ava:1146)
>>>>>          at
>>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:615)
>>>>>          at java.lang.Thread.run(Thread.java:679)
>>>>>
>>>>> I have tried to force reconnect to both hosts but that ends up
>>>>> maxing
>>> out
>>>>> a CPU core and filling up the log file with endless log lines.
>>>>>
>>>>> Any thoughts on how to recover my system?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> Find out more about ShapeBlue and our range of CloudStack related services
>>
>> IaaS Cloud Design & Build<
>> http://shapeblue.com/iaas-cloud-design-and-build//>
>> CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
>> CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/>
>> CloudStack Infrastructure Support<
>> http://shapeblue.com/cloudstack-infrastructure-support/>
>> CloudStack Bootcamp Training Courses<
>> http://shapeblue.com/cloudstack-training/>
>>
>> This email and any attachments to it may be confidential and are intended
>> solely for the use of the individual to whom it is addressed. Any views or
>> opinions expressed are solely those of the author and do not necessarily
>> represent those of Shape Blue Ltd or related companies. If you are not the
>> intended recipient of this email, you must neither take any action based
>> upon its contents, nor copy or show it to anyone. Please contact the sender
>> if you believe you have received this email in error. Shape Blue Ltd is a
>> company incorporated in England & Wales. ShapeBlue Services India LLP is a
>> company incorporated in India and is operated under license from Shape Blue
>> Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in Brasil
>> and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is
>> a company registered by The Republic of South Africa and is traded under
>> license from Shape Blue Ltd. ShapeBlue is a registered trademark.
>>


Mime
View raw message