cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Weller <swel...@ena.com>
Subject Re: [update] ACS management unable to connect to xenserver hosts after reboot
Date Wed, 17 Feb 2016 18:10:31 GMT
Stephan,

When you restart the management process, do you see any logs indicating it's trying to peer
with another management server?

- Si

________________________________________
From: Stephan Seitz <s.seitz@secretresearchfacility.com>
Sent: Wednesday, February 17, 2016 9:28 AM
To: dev@cloudstack.apache.org
Cc: Glenn Wagner
Subject: Re: [update] ACS management unable to connect to xenserver hosts after reboot

Glenn,

thanks for your reply. Unfortunately the SSVM has been destroyed.

We don't have any firewall in between. ACS and XenServers are located in
the same /22. I've double checked every connection and there's no
iptables or similar in the way.
Instead of the SSVM, I've just successfully checked if the consoleproxy
VM is able to connect to Port 8250.

To me it looks, like there's some strange "identity" problem.

mysql> select * from mshost;
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
| id | msid           | runid         | name             | state |
version | service_ip | service_port | last_update         | removed |
alert_count |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
|  1 | 57177340185274 | 1455209855143 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-12 16:55:56 | NULL    |
0 |
|  3 | 57177340185273 | 1455639355379 | acs-management-1 | Up    | 4.7.1
| 10.97.13.1 |         9090 | 2016-02-17 11:31:50 | NULL    |
0 |
+----+----------------+---------------+------------------+-------+---------+------------+--------------+---------------------+---------+-------------+
2 rows in set (0.00 sec)

Indeed, there is (and always has been) only one management host in this
infrastructure.

With sqldumps at hand, we removed the second row and purged all the
related jobs to that id, but after restarting cloudstack-management,
this entry wasi created again.

Maybe, I'm completely wrong, but is it possible that our management host
"thinks" there's another management host responsible for our cluster?

Since we're fiddling at least two days without any success here, I'm
willing to get a few consulting hours thrown on that.

cheers,

- Stephan

btw. sorry, if this is a double post, but I think the list ate my last
mail...


Am Dienstag, den 16.02.2016, 20:39 +0000 schrieb Glenn Wagner:
> Hi Stephan,
>
> Check that you can telnet port 8250 on the management server from
> SSVM , check that iptables has been setup correctly
> Looks like it’s a firewall issue on the ACS Management server
>
> Thanks
> Glenn
>
>
>
>
>
> ShapeBlue
> Glenn Wagner
> Senior
> Consultant
> ,
> ShapeBlue
> d:
>  | s: +27 21 527 0091
>  |
> m:
> +27 73 917 4111
> e:
> glenn.wagner@shapeblue.com | t:
>  |
> w:
> www.shapeblue.com
> a:
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South Africa
>
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is a company incorporated in India and is operated
> under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> is a company incorporated in Brasil and is operated under license from
> Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> Republic of South Africa and is traded under license from Shape Blue
> Ltd. ShapeBlue is a registered trademark.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have
> received this email in error.
>
>
>
>
>
> -----Original Message-----
> From: Stephan Seitz [mailto:s.seitz@secretresearchfacility.com]
> Sent: Tuesday, 16 February 2016 5:19 PM
> To: users@cloudstack.apache.org
> Cc: dev@cloudstack.apache.org
> Subject: [update] ACS management unable to connect to xenserver hosts
> after reboot
>
> Hi again!
>
> I think we've found the root source, but are unable to mitigate that:
>
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> closed
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
> 57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
> closed
>
> Here's a longer excerpt from the logfile during startup:
>
> http://pastebin.com/SftVJCs4
>
> Maybe someone knows how to resolve this? To me it looks like our
> single management-host has some kind of identity crisis?
>
>
> Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz:
> > Hi acs gurus!
> >
> > We're currently facing a really strange problem after two somewhat
> > simple steps.
> > 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> > located)
> > 2. Upgrade 4.7.0 to 4.7.1
> >
> > Both steps seemed successful and running, but after a few days I've
> > noticed the SSVM in "running, not connected" state, so I decided to
> > restart the SSVM. That's where all the trouble begun...
> >
> > I've pasted a somewhat repetive log excerpt here
> > http://pastebin.com/8MM6XUBk
> >
> > If I try to (force) reconnect a host, we're getting huge repetive
> log
> > entries like pasted here http://pastebin.com/cNR3TtkG
> >
> > Cloudmonkey quits with following Response:
> >
> > (local) 🐵 > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> > Error Connection refused by server: ('Connection aborted.',
> > BadStatusLine("''",))
> >
> >
> > I've tcpdump'ed relevant traffic between management and xenservers
> and
> > found simply nothing except some (i assume) unrelated NFS-Packets.
> >
> > Could please someone shed some light, how to fix that?
> >
> > Thanks in advance!
> >
> > - Stephan
>
>
>
> Find out more about ShapeBlue and our range of CloudStack related
> services:
> IaaS Cloud Design & Build | CSForge – rapid IaaS deployment framework
> CloudStack Consulting | CloudStack Software Engineering
> CloudStack Infrastructure Support | CloudStack Bootcamp Training
> Courses
>


Mime
View raw message