cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "edison su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CLOUDSTACK-5452) KVM - Agent is not able to connect back if management server was restarted when there are pending tasks to this host.
Date Tue, 04 Nov 2014 20:44:55 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196733#comment-14196733
] 

edison su commented on CLOUDSTACK-5452:
---------------------------------------

It's due to limitation of current agent model, can't cancel a running task on the agent side.
The problem is:
if there is running task which takes forever to finish, we can't do anything about it, unless
restart agent and kill all the running processes spawned by java agent. 
Need human intervention in this case. We have to manually kill this jobs, otherwise, the system
will be in inconsistent state.

> KVM - Agent is not able to connect back if management server was restarted when there
are pending tasks to this host.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5452
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5452
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.3.0
>         Environment: Build from 4.3
>            Reporter: Sangeetha Hariharan
>            Assignee: edison su
>            Priority: Critical
>             Fix For: 4.5.0
>
>
> KVM - Agent is not able to connect back if management server was restarted when there
are pending tasks to this host.
> Steps to reproduce the problem:
> Set up - Advanced zone with 2 KVM ( RHEL 6.3) hosts.
> Deployed few Vms.
> Started snapshot for ROOT volume of the VMs.
> When the snapshot processes  are still in progress , restart management server.
> When the management sever started , the KVM hosts remain in disconnected state.
> Attempt to stop Vms /start Vms fails because of having no connection to the host.
> Following is seen in agent logs:
> 2013-12-10 20:56:46,640 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:56:46,640 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:56:51,641 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:56:51,642 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:56:56,642 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:56:56,643 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:01,644 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:01,644 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:06,644 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:06,645 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:11,645 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:11,646 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:16,647 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:16,647 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:21,648 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:21,648 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:26,649 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:26,675 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:31,676 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:31,677 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:36,678 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:36,678 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Cannot connect
because we still have 1 commands in progress.
> 2013-12-10 20:57:41,678 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost connection
to the server. Dealing with the remaining commands...
> :



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message