cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafael Weingärtner <rafaelweingart...@gmail.com>
Subject Re: URGENT - CloudStack agent not able to connect to management server
Date Tue, 05 Apr 2016 12:24:29 GMT
How many hosts (hypervisors) are you managing with a single MS?

If you add new MSs, you need to balance their (HTTP 8080 and TCP 8250)
access with something like the HA proxy load balancer.



On Tue, Apr 5, 2016 at 2:09 AM, Sanjeev Neelarapu <
sanjeev.neelarapu@accelerite.com> wrote:

> Adding additional management server would definitely help.
>
> Best Regards,
> Sanjeev N
> Chief Product Engineer, Accelerite
> Off: +91 40 6722 9368 | EMail: sanjeev.neelarapu@accelerite.com
>
>
> -----Original Message-----
> From: Indra Pramana [mailto:indra@sg.or.id]
> Sent: Sunday, April 03, 2016 5:14 PM
> To: users@cloudstack.apache.org
> Subject: Re: URGENT - CloudStack agent not able to connect to management
> server
>
> Hi Lucian,
>
> Good day to you, and thank you for your reply. Apologise for the delay in
> my reply.
>
> Yes, I can confirm that we can access the host and port specified. Based
> on the logs, the host can connect to the management server but there's no
> follow-up logs which usually come after it's connected. Eventually, we
> could only connect back the host after we rebooted it, which means
> sacrificing all the VMs which were still up and running during the
> disconnection.
>
> At the time when the first hypervisor was disconnected, the CloudStack
> management servers were very busy handling the disconnections, trying to
> fence the hosts and initiate HA for all the affected VMs, based on the
> logs. Could this have put a strain on the management server, causing it to
> disconnect all the remaining hosts? Will adding new management server be
> able to resolve the problem?
>
> Any advice is appreciated.
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
> On Thu, Mar 31, 2016 at 5:28 PM, Nux! <nux@li.nux.ro> wrote:
>
> > Hello,
> >
> > Are you sure you can connect from the hypervisors to the
> > cloudstack-management on the host and port specified in the
> > agent.properties?
> >
> > --
> > Sent from the Delta quadrant using Borg technology!
> >
> > Nux!
> > www.nux.ro
> >
> > ----- Original Message -----
> > > From: "Indra Pramana" <indra@sg.or.id>
> > > To: users@cloudstack.apache.org
> > > Sent: Thursday, 31 March, 2016 03:14:59
> > > Subject: URGENT - CloudStack agent not able to connect to management
> > server
> >
> > > Dear all,
> > >
> > > We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage.
> > > All
> > our
> > > agents got disconnected from the management server and unable to
> > > connect again, despite rebooting the management server and stopping
> > > and
> > restarting
> > > the cloudstack-agent many times.
> > >
> > > We even tried to physically reboot a hypervisor host (sacrificing
> > > all the running VMs inside) to see if it can reconnect after
> > > boot-up, and it's
> > not
> > > able to reconnect (keep on "Connecting" state). Here's the excerpts
> > > from the logs:
> > >
> > > ====
> > > 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > Sending ping: Seq 0-11:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> > > 11,
> > >
> > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState
> > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,
> > "hostType":"Routing","hostId":0,"wait":0}}]
> > > }
> > > 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent]
> > > (Agent-Handler-2:null) Received response: Seq 0-11:  { Ans: ,
> > > MgmtId: 161342671900, via: 75,
> > Ver:
> > > v1, Flags: 100010,
> > >
> > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","
> > hostId":0,"wait":0},"result":true,"wait":0}}]
> > > }
> > > 2016-03-31 10:08:49,271 DEBUG
> > > [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Executing:
> > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > > get_rule_logs_for_vms
> > > 2016-03-31 10:08:49,350 DEBUG
> > > [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Execution is successful.
> > > 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > Sending ping: Seq 0-12:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> > > 11,
> > >
> > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState
> > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,
> > "hostType":"Routing","hostId":0,"wait":0}}]
> > > }
> > > 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent]
> > > (Agent-Handler-3:null) Received response: Seq 0-12:  { Ans: ,
> > > MgmtId: 161342671900, via: 75,
> > Ver:
> > > v1, Flags: 100010,
> > >
> > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","
> > hostId":0,"wait":0},"result":true,"wait":0}}]
> > > }
> > > 2016-03-31 10:09:49,272 DEBUG
> > > [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Executing:
> > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > > get_rule_logs_for_vms
> > > 2016-03-31 10:09:49,345 DEBUG
> > > [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Execution is successful.
> > > 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > Sending ping: Seq 0-13:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> > > 11,
> > >
> > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupState
> > s":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,
> > "hostType":"Routing","hostId":0,"wait":0}}]
> > > }
> > > 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent]
> > > (Agent-Handler-4:null) Received response: Seq 0-13:  { Ans: ,
> > > MgmtId: 161342671900, via: 75,
> > Ver:
> > > v1, Flags: 100010,
> > >
> > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","
> > hostId":0,"wait":0},"result":true,"wait":0}}]
> > > }
> > > ====
> > >
> > > On the existing hypervisor hosts, normally the agent would stuck at
> > > this stage and from Cloudstack GUI, we don't see the agent in
> "Connecting"
> > > state, it will be either on "Disconnected" or "Alert" state.
> > >
> > > ====
> > > 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null)
> > Executing:
> > > /bin/bash -c uname -r
> > > 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null)
> > > Execution is successful.
> > > 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding
> > > shutdown hook
> > > 2016-03-31 07:37:09,833 INFO  [cloud.agent.Agent] (main:null) Agent
> > > [id =
> > > 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers =
> 5 :
> > > host = 10.x.x.x : port = 8250
> > > 2016-03-31 07:37:09,856 INFO  [utils.nio.NioClient]
> > > (Agent-Selector:null) Connecting to 10.x.x.x:8250
> > > 2016-03-31 07:37:10,178 INFO  [utils.nio.NioClient]
> > > (Agent-Selector:null)
> > > SSL: Handshake done
> > > 2016-03-31 07:37:10,179 INFO  [utils.nio.NioClient]
> > > (Agent-Selector:null) Connected to 10.x.x.x:8250 ====
> > >
> > > No other significant and useful logs found on both the agents and
> > > management server logs.
> > >
> > > Anyone can give a clue on what could be the problem? Have been
> > > trying to reconnect in the past couple of hours without any issues.
> > > Any help is greatly appreciated.
> > >
> > > Looking forward to your reply, thnk you.
> > >
> > > Cheers.
> > >
> > > -ip-
> >
>
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Accelerite, a Persistent Systems business. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Accelerite, a Persistent Systems business does not accept any
> liability for virus infected mails.
>



-- 
Rafael Weingärtner

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message