cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Close <dean.cl...@icloudhosting.com>
Subject Redundant Router Interfaces
Date Fri, 01 Apr 2016 18:13:45 GMT
Hi guys,

I had been investigating a possible bug with the way interfaces are managed on virtual routers.
The public interfaces are being brought up on backup routers and (because they boot second)
they arp the IPs away from the master. I'd been examining an idea for a fix but whilst doing
so I found that the system appears to be designed to bring up these interfaces.

I suspect that a few things need to be reworked - but the changes necessary go so far against
what has been implemented that I wanted to open this up before doing the work.

Hopefully if I go through my findings you guys can help me see what I might be getting wrong.

The following was correct for pre-4.6 redundant routers:
 
  1. Both routers get configured with IP addresses, routes and iptables rules.
  2. Public interfaces are initially set as DOWN.
  3. Keepalived runs a VRRP instance on the private interface (eth0) to negotiate MASTER/BACKUP
roles.
  4. Keepalived manages the virtual IP on eth0 used as the public gateway for the guest VMs.
  5. Keepalived uses a master notify script to bring up the public interfaces.

The above was true for pre-4.6 routers. Now, however, things appear to work differently:

  1. Both routers get configured as before.
  2. All interfaces apart from eth1 (the Hypervisor-link interface) are set as UP.
  3. Keepalived runs a VRRP instance on the first public interface (eth2) to negotiate MASTER/BACKUP
roles.
  4. Keepalived manages the virtual IP as before.
  5. Keepalived uses a master notify script to bring up the public interfaces (unnecessary)
  6. Keepalived uses a backup notify script to bring down the public interfaces (unused)

This is unexpected for the following reasons:

  1. The keepalived notify script brings the public interfaces down when transitioning to
BACKUP - so how can we expect to run a VRRP instance over eth2?
  2. If interfaces are down when transitioning to BACKUP, why are they not expected to be
down to begin with? (Before the router has become MASTER)
  3. Why are we running a VRRP instance over an interface with an IP that will clash with
another host on the network?

The following method from the CsIP class in /opt/cloud/bin/cs/CsAddress.py confuses matters
futher:

    def check_is_up(self):
         """ Ensure device is up """
         cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice()
         for i in CsHelper.execute(cmd):
             if " DOWN " in i:
                 cmd2 = "ip link set %s up" % self.getDevice()
                 # If redundant only bring up public interfaces that are not eth1.
                 # Reason: private gateways are public interfaces.
                 # master.py and keepalived will deal with eth1 public interface. 
                 if self.cl.is_redundant() and (not self.is_public() or self.getDevice() not
in PUBLIC_INTERFACE):
                     CsHelper.execute(cmd2)
                 # if not redundant bring everything up
                 if not self.cl.is_redundant():
                     CsHelper.execute(cmd2)
 
The comments refer to eth1 as a public interface when this is the link to the hypervisor.
Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1']. But keepalived and master.py don't
influence eth1 at all. This looks like a mistake.

Lastly, the logic of this line looks flawed:

  if self.cl.is_redundant() and (not self.is_public() or self.getDevice() not in PUBLIC_INTERFACE)

As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will be ignored. Public
IPs will never be assigned to eth1, so this line evaluates as:

  
  if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE)

which reduces even further to:

  if self.cs.is_redundant() and self.is_control()


What would need doing
---------------------

  1. The keepalived.conf template would need to be changed to run the VRRP instance over eth0.
  2. The check_is_up method of the CsIP class should be renamed to 'bring_up_interfaces'.
For redundant routers it should ignore IPs that pass is_public or needs_vrrp.
  3. The arpPing method should do nothing if the interface is down.
  4. The PUBLIC_INTERFACE constant should be either renamed or dropped altogether.
  5. Other things that I haven't considered?


I'd really appreciate any feedback on this. It's possible that I've got it all wrong but I'm
suspecting not. I just don't want to tread on anyone's toes by submitting a PR that goes against
what appears to be an explicit design decision.


Kind regards,

Dean Close
iCloudHosting.com
http://www.icloudhosting.com
Tel: 01582 227927

Unit 2, Smallmead Road, Reading RG2 0QS

******************************************************************
The names iCloudHosting and iCloudHosting.com are trading styles of BBS Commerce Ltd which
is registered in England and Wales, Company Number 04837714. Please use our trading address
above for mail. Our registered office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead,
Reading, Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94.

This email and any files transmitted with it are confidential and intended solely for the
use of the individual or entity to whom they are addressed. If you are not the intended recipient
you are not authorised to and must not disclose, copy, distribute, or retain this message
or any part of it.

iCloudHosting accepts no responsibility for information, errors or omissions in this email.
******************************************************************


Mime
View raw message