cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Indra Pramana <in...@sg.or.id>
Subject Re: URGENT - CloudStack agent not able to connect to management server
Date Wed, 06 Apr 2016 04:30:49 GMT
Hi Sanjeev and Rafael,

Good day to you, and thank you for your replies and advice.

We are getting a new management server and HA proxy load balancers. Will
see if this can resolve the problem.

Thank you.



On Tue, Apr 5, 2016 at 8:24 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> How many hosts (hypervisors) are you managing with a single MS?
>
> If you add new MSs, you need to balance their (HTTP 8080 and TCP 8250)
> access with something like the HA proxy load balancer.
>
>
>
> On Tue, Apr 5, 2016 at 2:09 AM, Sanjeev Neelarapu <
> sanjeev.neelarapu@accelerite.com> wrote:
>
> > Adding additional management server would definitely help.
> >
> > Best Regards,
> > Sanjeev N
> > Chief Product Engineer, Accelerite
> > Off: +91 40 6722 9368 | EMail: sanjeev.neelarapu@accelerite.com
> >
> >
> > -----Original Message-----
> > From: Indra Pramana [mailto:indra@sg.or.id]
> > Sent: Sunday, April 03, 2016 5:14 PM
> > To: users@cloudstack.apache.org
> > Subject: Re: URGENT - CloudStack agent not able to connect to management
> > server
> >
> > Hi Lucian,
> >
> > Good day to you, and thank you for your reply. Apologise for the delay in
> > my reply.
> >
> > Yes, I can confirm that we can access the host and port specified. Based
> > on the logs, the host can connect to the management server but there's no
> > follow-up logs which usually come after it's connected. Eventually, we
> > could only connect back the host after we rebooted it, which means
> > sacrificing all the VMs which were still up and running during the
> > disconnection.
> >
> > At the time when the first hypervisor was disconnected, the CloudStack
> > management servers were very busy handling the disconnections, trying to
> > fence the hosts and initiate HA for all the affected VMs, based on the
> > logs. Could this have put a strain on the management server, causing it
> to
> > disconnect all the remaining hosts? Will adding new management server be
> > able to resolve the problem?
> >
> > Any advice is appreciated.
> >
> > Looking forward to your reply, thank you.
> >
> > Cheers.
> >
> > On Thu, Mar 31, 2016 at 5:28 PM, Nux! <nux@li.nux.ro> wrote:
> >
> > > Hello,
> > >
> > > Are you sure you can connect from the hypervisors to the
> > > cloudstack-management on the host and port specified in the
> > > agent.properties?
> > >
> > > --
> > > Sent from the Delta quadrant using Borg technology!
> > >
> > > Nux!
> > > www.nux.ro
> > >
> > > ----- Original Message -----
> > > > From: "Indra Pramana" <indra@sg.or.id>
> > > > To: users@cloudstack.apache.org
> > > > Sent: Thursday, 31 March, 2016 03:14:59
> > > > Subject: URGENT - CloudStack agent not able to connect to management
> > > server
> > >
> > > > Dear all,
> > > >
> > > > We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage.
> > > > All
> > > our
> > > > agents got disconnected from the management server and unable to
> > > > connect again, despite rebooting the management server and stopping
> > > > and
> > > restarting
> > > > the cloudstack-agent many times.
> > > >
> > > > We even tried to physically reboot a hypervisor host (sacrificing
> > > > all the running VMs inside) to see if it can reconnect after
> > > > boot-up, and it's
> > > not
> > > > able to reconnect (keep on "Connecting" state). Here's the excerpts
> > > > from the logs:
> > > >
> > > > ====
> > > > 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > > Sending ping: Seq 0-11:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> > > > 11,
> > > >
> > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState
> > > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,
> > > "hostType":"Routing","hostId":0,"wait":0}}]
> > > > }
> > > > 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent]
> > > > (Agent-Handler-2:null) Received response: Seq 0-11:  { Ans: ,
> > > > MgmtId: 161342671900, via: 75,
> > > Ver:
> > > > v1, Flags: 100010,
> > > >
> > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","
> > > hostId":0,"wait":0},"result":true,"wait":0}}]
> > > > }
> > > > 2016-03-31 10:08:49,271 DEBUG
> > > > [kvm.resource.LibvirtComputingResource]
> > > > (UgentTask-5:null) Executing:
> > > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > > > get_rule_logs_for_vms
> > > > 2016-03-31 10:08:49,350 DEBUG
> > > > [kvm.resource.LibvirtComputingResource]
> > > > (UgentTask-5:null) Execution is successful.
> > > > 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > > Sending ping: Seq 0-12:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> > > > 11,
> > > >
> > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState
> > > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,
> > > "hostType":"Routing","hostId":0,"wait":0}}]
> > > > }
> > > > 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent]
> > > > (Agent-Handler-3:null) Received response: Seq 0-12:  { Ans: ,
> > > > MgmtId: 161342671900, via: 75,
> > > Ver:
> > > > v1, Flags: 100010,
> > > >
> > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","
> > > hostId":0,"wait":0},"result":true,"wait":0}}]
> > > > }
> > > > 2016-03-31 10:09:49,272 DEBUG
> > > > [kvm.resource.LibvirtComputingResource]
> > > > (UgentTask-5:null) Executing:
> > > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > > > get_rule_logs_for_vms
> > > > 2016-03-31 10:09:49,345 DEBUG
> > > > [kvm.resource.LibvirtComputingResource]
> > > > (UgentTask-5:null) Execution is successful.
> > > > 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > > Sending ping: Seq 0-13:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> > > > 11,
> > > >
> > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState
> > > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,
> > > "hostType":"Routing","hostId":0,"wait":0}}]
> > > > }
> > > > 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent]
> > > > (Agent-Handler-4:null) Received response: Seq 0-13:  { Ans: ,
> > > > MgmtId: 161342671900, via: 75,
> > > Ver:
> > > > v1, Flags: 100010,
> > > >
> > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","
> > > hostId":0,"wait":0},"result":true,"wait":0}}]
> > > > }
> > > > ====
> > > >
> > > > On the existing hypervisor hosts, normally the agent would stuck at
> > > > this stage and from Cloudstack GUI, we don't see the agent in
> > "Connecting"
> > > > state, it will be either on "Disconnected" or "Alert" state.
> > > >
> > > > ====
> > > > 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null)
> > > Executing:
> > > > /bin/bash -c uname -r
> > > > 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null)
> > > > Execution is successful.
> > > > 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding
> > > > shutdown hook
> > > > 2016-03-31 07:37:09,833 INFO  [cloud.agent.Agent] (main:null) Agent
> > > > [id =
> > > > 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers =
> > 5 :
> > > > host = 10.x.x.x : port = 8250
> > > > 2016-03-31 07:37:09,856 INFO  [utils.nio.NioClient]
> > > > (Agent-Selector:null) Connecting to 10.x.x.x:8250
> > > > 2016-03-31 07:37:10,178 INFO  [utils.nio.NioClient]
> > > > (Agent-Selector:null)
> > > > SSL: Handshake done
> > > > 2016-03-31 07:37:10,179 INFO  [utils.nio.NioClient]
> > > > (Agent-Selector:null) Connected to 10.x.x.x:8250 ====
> > > >
> > > > No other significant and useful logs found on both the agents and
> > > > management server logs.
> > > >
> > > > Anyone can give a clue on what could be the problem? Have been
> > > > trying to reconnect in the past couple of hours without any issues.
> > > > Any help is greatly appreciated.
> > > >
> > > > Looking forward to your reply, thnk you.
> > > >
> > > > Cheers.
> > > >
> > > > -ip-
> > >
> >
> >
> >
> > DISCLAIMER
> > ==========
> > This e-mail may contain privileged and confidential information which is
> > the property of Accelerite, a Persistent Systems business. It is intended
> > only for the use of the individual or entity to which it is addressed. If
> > you are not the intended recipient, you are not authorized to read,
> retain,
> > copy, print, distribute or use this message. If you have received this
> > communication in error, please notify the sender and delete all copies of
> > this message. Accelerite, a Persistent Systems business does not accept
> any
> > liability for virus infected mails.
> >
>
>
>
> --
> Rafael Weingärtner
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message