cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Indra Pramana <in...@sg.or.id>
Subject Re: URGENT - CloudStack agent not able to connect to management server
Date Sun, 03 Apr 2016 11:43:35 GMT
Hi Lucian,

Good day to you, and thank you for your reply. Apologise for the delay in
my reply.

Yes, I can confirm that we can access the host and port specified. Based on
the logs, the host can connect to the management server but there's no
follow-up logs which usually come after it's connected. Eventually, we
could only connect back the host after we rebooted it, which means
sacrificing all the VMs which were still up and running during the
disconnection.

At the time when the first hypervisor was disconnected, the CloudStack
management servers were very busy handling the disconnections, trying to
fence the hosts and initiate HA for all the affected VMs, based on the
logs. Could this have put a strain on the management server, causing it to
disconnect all the remaining hosts? Will adding new management server be
able to resolve the problem?

Any advice is appreciated.

Looking forward to your reply, thank you.

Cheers.

On Thu, Mar 31, 2016 at 5:28 PM, Nux! <nux@li.nux.ro> wrote:

> Hello,
>
> Are you sure you can connect from the hypervisors to the
> cloudstack-management on the host and port specified in the
> agent.properties?
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
>
> ----- Original Message -----
> > From: "Indra Pramana" <indra@sg.or.id>
> > To: users@cloudstack.apache.org
> > Sent: Thursday, 31 March, 2016 03:14:59
> > Subject: URGENT - CloudStack agent not able to connect to management
> server
>
> > Dear all,
> >
> > We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage. All
> our
> > agents got disconnected from the management server and unable to connect
> > again, despite rebooting the management server and stopping and
> restarting
> > the cloudstack-agent many times.
> >
> > We even tried to physically reboot a hypervisor host (sacrificing all the
> > running VMs inside) to see if it can reconnect after boot-up, and it's
> not
> > able to reconnect (keep on "Connecting" state). Here's the excerpts from
> > the logs:
> >
> > ====
> > 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > Sending ping: Seq 0-11:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
> >
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> > }
> > 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null)
> > Received response: Seq 0-11:  { Ans: , MgmtId: 161342671900, via: 75,
> Ver:
> > v1, Flags: 100010,
> >
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> > }
> > 2016-03-31 10:08:49,271 DEBUG [kvm.resource.LibvirtComputingResource]
> > (UgentTask-5:null) Executing:
> > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > get_rule_logs_for_vms
> > 2016-03-31 10:08:49,350 DEBUG [kvm.resource.LibvirtComputingResource]
> > (UgentTask-5:null) Execution is successful.
> > 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > Sending ping: Seq 0-12:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
> >
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> > }
> > 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null)
> > Received response: Seq 0-12:  { Ans: , MgmtId: 161342671900, via: 75,
> Ver:
> > v1, Flags: 100010,
> >
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> > }
> > 2016-03-31 10:09:49,272 DEBUG [kvm.resource.LibvirtComputingResource]
> > (UgentTask-5:null) Executing:
> > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > get_rule_logs_for_vms
> > 2016-03-31 10:09:49,345 DEBUG [kvm.resource.LibvirtComputingResource]
> > (UgentTask-5:null) Execution is successful.
> > 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > Sending ping: Seq 0-13:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11,
> >
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> > }
> > 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null)
> > Received response: Seq 0-13:  { Ans: , MgmtId: 161342671900, via: 75,
> Ver:
> > v1, Flags: 100010,
> >
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> > }
> > ====
> >
> > On the existing hypervisor hosts, normally the agent would stuck at this
> > stage and from Cloudstack GUI, we don't see the agent in "Connecting"
> > state, it will be either on "Disconnected" or "Alert" state.
> >
> > ====
> > 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null)
> Executing:
> > /bin/bash -c uname -r
> > 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null) Execution
> > is successful.
> > 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding
> > shutdown hook
> > 2016-03-31 07:37:09,833 INFO  [cloud.agent.Agent] (main:null) Agent [id =
> > 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers = 5 :
> > host = 10.x.x.x : port = 8250
> > 2016-03-31 07:37:09,856 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > Connecting to 10.x.x.x:8250
> > 2016-03-31 07:37:10,178 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > SSL: Handshake done
> > 2016-03-31 07:37:10,179 INFO  [utils.nio.NioClient] (Agent-Selector:null)
> > Connected to 10.x.x.x:8250
> > ====
> >
> > No other significant and useful logs found on both the agents and
> > management server logs.
> >
> > Anyone can give a clue on what could be the problem? Have been trying to
> > reconnect in the past couple of hours without any issues. Any help is
> > greatly appreciated.
> >
> > Looking forward to your reply, thnk you.
> >
> > Cheers.
> >
> > -ip-
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message