cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (CLOUDSTACK-8883) [Blocker] KVM host goes into disconnected state when MS is restarted
Date Wed, 23 Sep 2015 05:53:05 GMT


ASF subversion and git services commented on CLOUDSTACK-8883:

Commit 1a474374b9c936ed40a1d83ce8b75ea189f23399 in cloudstack's branch refs/heads/master from
[;h=1a47437 ]

Merge pull request #863 from borisroman/CLOUDSTACK-8883

[4.6][BLOCKER]CLOUDSTACK-8883: Resolved connect/reconnect issue.Hi!

@wilderrodrigues by implementing Callable you switched a couple of methods and fields. I switched
them some more!

The reason why the Agent wouldn't reconnect was due to two facts.

Problem 1: Selector was blocking.
In the while loop at [1]; was blocking when the connection was lost. This
means at [2] _isStartup = false; was never excecuted. Therefore at [3] the call to isStartup()
always returned true resulting in an infinite loop.

Resolution 1: Move the call to cleanUp() [4] before checking if isStartup() has turned to
false. cleanUp() will close() the _selector resulting in _isStartup to be set to false.

Problem 2: Setting _isStartup & _isRunning to true when init() throwed an unchecked exception
The exception was nicely caught, but only logged. No action was taken! Resulting in _isStartup
& _isRunning being set to true. Resulting in the fact the Agent thought it was connected
successfully, though it wasn't.

Resolution 2: Adding return to the catch statement [5]. This way _isStartup & _isRunning
aren't set to true.

Steps to test:
1. Deploy ACS.
2. Try all combinations of stopping/starting managment server/agent.


* pr/863:
  Added return statement to stop start() if there has been an ConnectException.
  Call cleanUp() before looping isStartup().

Signed-off-by: Rajani Karuturi <>

> [Blocker] KVM host goes into disconnected state when MS is restarted
> --------------------------------------------------------------------
>                 Key: CLOUDSTACK-8883
>                 URL:
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>    Affects Versions: 4.6.0
>            Reporter: Raja Pullela
>            Assignee: Boris Schrijver
>            Priority: Blocker
>             Fix For: 4.6.0
> steps to reproduce:
> - restart MS
> - see the KVM host status
> Expected
> - Agent should reconnect
> Actual
> - Host states in disconnect state and Agent does not reconnect
> Apparently a recent commit broke and BVTs are for KVM are all failing because Hosts go
into a disconnected state and the SSVM/CPVMs don't come up.  
> Current Agent Log - during the MS restart
> 2015-09-18 07:05:37,301 INFO  [] (agentRequest-    
              Handler-5:null) Asking libvirt to refresh storage pool c8bd627f-101f-3215-8545-7
> 2015-09-18 07:06:37,452 INFO  [] (agentRequest-Handler-1:null)
Trye pool c8bd627f-101f-3215-8545-72f7ce50f2c6 from libvirt
> 2015-09-18 07:06:37,469 INFO  [] (agentRequest-Handler-1:null)
Askesh storage pool c8bd627f-101f-3215-8545-72f7ce50f2c6
> 2015-09-18 07:07:32,417 INFO  [cloud.agent.Agent] (Agent-Handler-5:null) Lost connection
to the server. Dealing with the remaining commands...
> Previously Agent used to reconnect -
> 2015-09-18 12:15:11,902 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Reconnecting...
> 2015-09-18 12:15:11,903 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting
> 2015-09-18 12:15:11,904 WARN  [utils.nio.NioConnection] (Agent-Selector:null) Unable
to connect to remote: is there a server running on port 8250

This message was sent by Atlassian JIRA

View raw message