Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3DCF183AE for ; Fri, 1 Apr 2016 19:14:29 +0000 (UTC) Received: (qmail 1870 invoked by uid 500); 1 Apr 2016 19:14:29 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 1808 invoked by uid 500); 1 Apr 2016 19:14:29 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 1796 invoked by uid 99); 1 Apr 2016 19:14:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Apr 2016 19:14:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D93841A05B6 for ; Fri, 1 Apr 2016 19:14:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id CrnnF75tQvuD for ; Fri, 1 Apr 2016 19:14:26 +0000 (UTC) Received: from mail-qg0-f53.google.com (mail-qg0-f53.google.com [209.85.192.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C42EB5F573 for ; Fri, 1 Apr 2016 19:14:25 +0000 (UTC) Received: by mail-qg0-f53.google.com with SMTP id j35so104410040qge.0 for ; Fri, 01 Apr 2016 12:14:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=KyT0r8KXYBUpUkc2dv4HJTYFZTep5l7aWwxjafdCVB0=; b=Mjd60XAa9taxjUJJRSXrjuam+e9KrNt97LddNCM1GgxFF3h0Q3pMQpXuNCnrXxezKR BLsIZ1iPyy8bZKyvhoGcmdk5SVYmz9rhHNq8Tp2bNNiss4nvbpKgU9/H1wDlv+/5wTM+ UVijrCx+hcvqxclxHMUBWKPo0LHVHk6ueDIwiZ6MiJbXejz/b5rubXcqxPnkbWUslLKy jITcwcMP7Sfz3CzU5+2kVOHaOFfA2lMJ+zdWXlLPw0SfuYfhzy2UHLGtSzCyTwypd6zB lhD2TDjPaBk5Xdr24i6I2PU1GSJ+3+h2iYREKsQkVyxpazcB940kE/bzniIg6qDdbfIq McrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=KyT0r8KXYBUpUkc2dv4HJTYFZTep5l7aWwxjafdCVB0=; b=mWrqwxRmE48ykJBAHJZoD5p8bCvxHlbxHZxvRikca1pTqq/BazyCT8nPwjB893xhIg 6oghta+1XfWA3ZXcLuLCMRxzIYDY1jUcS8EQIade29zPJtDU0X78aTwbFShA/GJiGmsG ahOIcEbSKnD8XnGHitAd0FBnH6Ktmcn+alk9DVHzsEZUlBXHfUsC+oUh75DataAr4VIW sSZ+kuafQAL0gnfqRJBi4R6nSFtUZLM+ju1iYUtlfk5rys4TICTZPaWXq5nCRP8FVNLc aHwnRa5m+wg5eblqfY8DQ2UZxERh2YQi4aj5KBCQHdBBOAzq2zKsVA7eSPG0NqEMieXt /B9g== X-Gm-Message-State: AD7BkJIYjFXIV+antVtI4wcHeewwYStSLcm56/pzv55lVQ+aOtqxGdYo+639LcwjvSnd8d0YQrcO0K8WHHvH+A== MIME-Version: 1.0 X-Received: by 10.140.104.242 with SMTP id a105mr8925775qgf.1.1459538059146; Fri, 01 Apr 2016 12:14:19 -0700 (PDT) Received: by 10.140.102.231 with HTTP; Fri, 1 Apr 2016 12:14:19 -0700 (PDT) In-Reply-To: References: Date: Fri, 1 Apr 2016 21:14:19 +0200 Message-ID: Subject: Re: Redundant Router Interfaces From: Wei ZHOU To: "dev@cloudstack.apache.org" Content-Type: multipart/alternative; boundary=001a1134f7688d5770052f712fd0 --001a1134f7688d5770052f712fd0 Content-Type: text/plain; charset=UTF-8 Dean, I just fixed it yesterday. the commit is --- diff --git a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py index 5f63c06..5256d03 100755 --- a/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py +++ b/systemvm/patches/debian/config/opt/cloud/bin/cs/CsAddress.py @@ -27,7 +27,7 @@ from CsRoute import CsRoute from CsRule import CsRule VRRP_TYPES = ['guest'] -PUBLIC_INTERFACE = ['eth1'] +VPC_PUBLIC_INTERFACE = ['eth1'] class CsAddress(CsDataBag): @@ -323,7 +323,7 @@ class CsIP: # If redundant only bring up public interfaces that are not eth1. # Reason: private gateways are public interfaces. # master.py and keepalived will deal with eth1 public interface. - if self.cl.is_redundant() and (not self.is_public() or self.getDevice() not in PUBLIC_INTERFACE): + if self.cl.is_redundant() and (not self.is_public() or (self.config.is_vpc() and self.getDevice() not in VPC_PUBLIC_INTERFACE)): CsHelper.execute(cmd2) # if not redundant bring everything up if not self.cl.is_redundant(): --- - Wei 2016-04-01 20:13 GMT+02:00 Dean Close : > Hi guys, > > I had been investigating a possible bug with the way interfaces are > managed on virtual routers. The public interfaces are being brought up on > backup routers and (because they boot second) they arp the IPs away from > the master. I'd been examining an idea for a fix but whilst doing so I > found that the system appears to be designed to bring up these interfaces. > > I suspect that a few things need to be reworked - but the changes > necessary go so far against what has been implemented that I wanted to open > this up before doing the work. > > Hopefully if I go through my findings you guys can help me see what I > might be getting wrong. > > The following was correct for pre-4.6 redundant routers: > > 1. Both routers get configured with IP addresses, routes and iptables > rules. > 2. Public interfaces are initially set as DOWN. > 3. Keepalived runs a VRRP instance on the private interface (eth0) to > negotiate MASTER/BACKUP roles. > 4. Keepalived manages the virtual IP on eth0 used as the public gateway > for the guest VMs. > 5. Keepalived uses a master notify script to bring up the public > interfaces. > > The above was true for pre-4.6 routers. Now, however, things appear to > work differently: > > 1. Both routers get configured as before. > 2. All interfaces apart from eth1 (the Hypervisor-link interface) are > set as UP. > 3. Keepalived runs a VRRP instance on the first public interface (eth2) > to negotiate MASTER/BACKUP roles. > 4. Keepalived manages the virtual IP as before. > 5. Keepalived uses a master notify script to bring up the public > interfaces (unnecessary) > 6. Keepalived uses a backup notify script to bring down the public > interfaces (unused) > > This is unexpected for the following reasons: > > 1. The keepalived notify script brings the public interfaces down when > transitioning to BACKUP - so how can we expect to run a VRRP instance over > eth2? > 2. If interfaces are down when transitioning to BACKUP, why are they not > expected to be down to begin with? (Before the router has become MASTER) > 3. Why are we running a VRRP instance over an interface with an IP that > will clash with another host on the network? > > The following method from the CsIP class in /opt/cloud/bin/cs/CsAddress.py > confuses matters futher: > > def check_is_up(self): > """ Ensure device is up """ > cmd = "ip link show %s | grep 'state DOWN'" % self.getDevice() > for i in CsHelper.execute(cmd): > if " DOWN " in i: > cmd2 = "ip link set %s up" % self.getDevice() > # If redundant only bring up public interfaces that are > not eth1. > # Reason: private gateways are public interfaces. > # master.py and keepalived will deal with eth1 public > interface. > if self.cl.is_redundant() and (not self.is_public() or > self.getDevice() not in PUBLIC_INTERFACE): > CsHelper.execute(cmd2) > # if not redundant bring everything up > if not self.cl.is_redundant(): > CsHelper.execute(cmd2) > > The comments refer to eth1 as a public interface when this is the link to > the hypervisor. Indeed, PUBLIC_INTERFACE is defined on line 31 as ['eth1']. > But keepalived and master.py don't influence eth1 at all. This looks like a > mistake. > > Lastly, the logic of this line looks flawed: > > if self.cl.is_redundant() and (not self.is_public() or self.getDevice() > not in PUBLIC_INTERFACE) > > As PUBLIC_INTERFACE is limited to eth1, the `not self.is_public()` will be > ignored. Public IPs will never be assigned to eth1, so this line evaluates > as: > > > if self.cl.is_redundant() and (self.getDevice() not in PUBLIC_INTERFACE) > > which reduces even further to: > > if self.cs.is_redundant() and self.is_control() > > > What would need doing > --------------------- > > 1. The keepalived.conf template would need to be changed to run the VRRP > instance over eth0. > 2. The check_is_up method of the CsIP class should be renamed to > 'bring_up_interfaces'. For redundant routers it should ignore IPs that pass > is_public or needs_vrrp. > 3. The arpPing method should do nothing if the interface is down. > 4. The PUBLIC_INTERFACE constant should be either renamed or dropped > altogether. > 5. Other things that I haven't considered? > > > I'd really appreciate any feedback on this. It's possible that I've got it > all wrong but I'm suspecting not. I just don't want to tread on anyone's > toes by submitting a PR that goes against what appears to be an explicit > design decision. > > > Kind regards, > > Dean Close > iCloudHosting.com > http://www.icloudhosting.com > Tel: 01582 227927 > > Unit 2, Smallmead Road, Reading RG2 0QS > > ****************************************************************** > The names iCloudHosting and iCloudHosting.com are trading styles of BBS > Commerce Ltd which is registered in England and Wales, Company Number > 04837714. Please use our trading address above for mail. Our registered > office is 5 Theale Lakes Business Park, Moulden Way, Sulhamstead, Reading, > Berkshire, RG7 4GB. VAT Registration Number GB 982 8230 94. > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you are not the intended recipient you are not authorised to and must > not disclose, copy, distribute, or retain this message or any part of it. > > iCloudHosting accepts no responsibility for information, errors or > omissions in this email. > ****************************************************************** > > --001a1134f7688d5770052f712fd0--