cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Indra Pramana <in...@sg.or.id>
Subject URGENT - CloudStack agent not able to connect to management server
Date Thu, 31 Mar 2016 02:14:59 GMT
Dear all,

We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage. All our
agents got disconnected from the management server and unable to connect
again, despite rebooting the management server and stopping and restarting
the cloudstack-agent many times.

We even tried to physically reboot a hypervisor host (sacrificing all the
running VMs inside) to see if it can reconnect after boot-up, and it's not
able to reconnect (keep on "Connecting" state). Here's the excerpts from
the logs:

====
2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
Sending ping: Seq 0-11:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
}
2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null)
Received response: Seq 0-11:  { Ans: , MgmtId: 161342671900, via: 75, Ver:
v1, Flags: 100010,
[{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
}
2016-03-31 10:08:49,271 DEBUG [kvm.resource.LibvirtComputingResource]
(UgentTask-5:null) Executing:
/usr/share/cloudstack-common/scripts/vm/network/security_group.py
get_rule_logs_for_vms
2016-03-31 10:08:49,350 DEBUG [kvm.resource.LibvirtComputingResource]
(UgentTask-5:null) Execution is successful.
2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
Sending ping: Seq 0-12:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
}
2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null)
Received response: Seq 0-12:  { Ans: , MgmtId: 161342671900, via: 75, Ver:
v1, Flags: 100010,
[{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
}
2016-03-31 10:09:49,272 DEBUG [kvm.resource.LibvirtComputingResource]
(UgentTask-5:null) Executing:
/usr/share/cloudstack-common/scripts/vm/network/security_group.py
get_rule_logs_for_vms
2016-03-31 10:09:49,345 DEBUG [kvm.resource.LibvirtComputingResource]
(UgentTask-5:null) Execution is successful.
2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
Sending ping: Seq 0-13:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
}
2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null)
Received response: Seq 0-13:  { Ans: , MgmtId: 161342671900, via: 75, Ver:
v1, Flags: 100010,
[{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
}
====

On the existing hypervisor hosts, normally the agent would stuck at this
stage and from Cloudstack GUI, we don't see the agent in "Connecting"
state, it will be either on "Disconnected" or "Alert" state.

====
2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null) Executing:
/bin/bash -c uname -r
2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null) Execution
is successful.
2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding
shutdown hook
2016-03-31 07:37:09,833 INFO  [cloud.agent.Agent] (main:null) Agent [id =
73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers = 5 :
host = 10.x.x.x : port = 8250
2016-03-31 07:37:09,856 INFO  [utils.nio.NioClient] (Agent-Selector:null)
Connecting to 10.x.x.x:8250
2016-03-31 07:37:10,178 INFO  [utils.nio.NioClient] (Agent-Selector:null)
SSL: Handshake done
2016-03-31 07:37:10,179 INFO  [utils.nio.NioClient] (Agent-Selector:null)
Connected to 10.x.x.x:8250
====

No other significant and useful logs found on both the agents and
management server logs.

Anyone can give a clue on what could be the problem? Have been trying to
reconnect in the past couple of hours without any issues. Any help is
greatly appreciated.

Looking forward to your reply, thnk you.

Cheers.

-ip-

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message