From users-return-30499-archive-asf-public=cust-asf.ponee.io@cloudstack.apache.org Tue May 1 15:20:57 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id BF4E2180645 for ; Tue, 1 May 2018 15:20:56 +0200 (CEST) Received: (qmail 85301 invoked by uid 500); 1 May 2018 13:20:54 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 85274 invoked by uid 99); 1 May 2018 13:20:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2018 13:20:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 88E2118048A; Tue, 1 May 2018 13:20:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=ena.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 2MkSU2UAmv8E; Tue, 1 May 2018 13:20:50 +0000 (UTC) Received: from smtp5i.ena.net (smtp5i.ena.net [96.5.1.13]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 83E1C5F124; Tue, 1 May 2018 13:20:49 +0000 (UTC) Authentication-Results: smtp5i.ena.net; dkim=pass header.d=ena.com; dmarc=pass (policy=reject) header.from=ena.com; spf=pass smtp.mailfrom=sweller@ena.com Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01lp0180.outbound.protection.outlook.com [216.32.180.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp5i.ena.net (Postfix) with ESMTPS id 208E31480BAF; Tue, 1 May 2018 08:20:30 -0500 (CDT) Received: from BN6PR02MB3331.namprd02.prod.outlook.com (10.161.152.155) by BN6PR02MB2369.namprd02.prod.outlook.com (10.168.254.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.715.20; Tue, 1 May 2018 13:20:28 +0000 Received: from BN6PR02MB3331.namprd02.prod.outlook.com ([fe80::dd3d:ca1f:16f5:b6ec]) by BN6PR02MB3331.namprd02.prod.outlook.com ([fe80::dd3d:ca1f:16f5:b6ec%13]) with mapi id 15.20.0715.018; Tue, 1 May 2018 13:20:28 +0000 From: Simon Weller To: "users@cloudstack.apache.org" , "dev@cloudstack.apache.org" Subject: Re: [DISCUSS] VR upgrade downtime reduction Thread-Topic: [DISCUSS] VR upgrade downtime reduction Thread-Index: AQHTnphEeAMIr/A/dkmzBoA0onBelqOXZNSAgAFWGICAABJLgIABd6+AgIDmlICAAAWJAIAAMBPW Date: Tue, 1 May 2018 13:20:28 +0000 Message-ID: References: <3e74dbf6-5545-27d1-88e0-967321b0e6a9@renemoser.net> <648277964.6226.1518002233802.JavaMail.zimbra@li.nux.ro> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [96.4.0.206] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;BN6PR02MB2369;7:S0avZVGadZWIuoEvTaV+5pMpdllOGfk+uegXbVvCtjSciBeV7MYztaGc0YH4CgRHx7eexegwpSTFtGQuCI+ZATdj77f64GrveDvZ16QlVS0DKuqTEeGF5egSZddjZDa6b4DVjl7kMKxHmImnTylYpii8bau13J0jxRSjWr08TuKF3+ZxcD7OcxoY6oZGtb7aeeSuMDIEiyo+QmjL52Wa1OJs8Quy17snkDgMqFQ+90HnsQ/e9hfB52eMA/zlru6G x-ms-exchange-antispam-srfa-diagnostics: SOS; x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020);SRVR:BN6PR02MB2369; x-ms-traffictypediagnostic: BN6PR02MB2369:|BN6PR02MB2369: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(166708455590820)(85827821059158)(788757137089)(100405760836317); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040522)(2401047)(8121501046)(5005006)(3231254)(944501410)(52105095)(3002001)(93006095)(93001095)(10201501046)(6041310)(20161123564045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123560045)(6072148)(201708071742011);SRVR:BN6PR02MB2369;BCL:0;PCL:0;RULEID:;SRVR:BN6PR02MB2369; x-forefront-prvs: 06592CCE58 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(39380400002)(366004)(376002)(39860400002)(346002)(396003)(199004)(189003)(13464003)(102836004)(76176011)(966005)(26005)(53546011)(6606003)(7736002)(74316002)(7696005)(110136005)(106356001)(14454004)(486006)(2900100001)(8676002)(66066001)(5250100002)(2501003)(8936002)(81166006)(81156014)(229853002)(59450400001)(6306002)(97736004)(93886005)(105586002)(99286004)(5660300001)(6506007)(68736007)(478600001)(6246003)(33656002)(19627405001)(6436002)(236005)(6116002)(2906002)(86362001)(186003)(3660700001)(15974865002)(476003)(606006)(3280700002)(55016002)(446003)(9686003)(54896002)(316002)(11346002)(25786009)(3846002)(53386004)(53936002);DIR:OUT;SFP:1101;SCL:1;SRVR:BN6PR02MB2369;H:BN6PR02MB3331.namprd02.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: ena.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: vG56up/6i1z91FwlCKmFElgHExvWPm+eQnwL7MnOjm5aHtWsEBXS3PHQCY4jl3OjCTVAoDKHjFjHRbg3hZ/qaGxWbf8S5NUynLplP0P86qKRYI99SXaVxShLrJb0RsYxirh/Jl3F+7w6YCTls/+h151nT0nm4Iz3iOJ6d7zLo5XJ6v7jkyNl+hj54jPrcB74 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_BN6PR02MB3331076475080AB6DA8AE6D3A9810BN6PR02MB3331namp_" MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 3b359b03-35cf-4584-b1e9-08d5af664ece X-OriginatorOrg: ena.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3b359b03-35cf-4584-b1e9-08d5af664ece X-MS-Exchange-CrossTenant-originalarrivaltime: 01 May 2018 13:20:28.4455 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 6dc38cd4-4d4f-4826-9649-17854289d170 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR02MB2369 X-Spamd-Bar: / X-Rspamd-Server: mr13.mail.ena.net X-Rspamd-Queue-Id: 208E31480BAF X-Spamd-Result: default: False [0.64 / 4.00]; DMARC_POLICY_ALLOW(-0.25)[ena.com,reject]; RCVD_TLS_LAST(0.00)[]; RBL_HOSTKARMA_QUIT(-1.20)[180.180.32.216.hostkarma.junkemailfilter.com : 127.0.1.1]; RCVD_IN_DNSWL_NONE(0.00)[180.180.32.216.list.dnswl.org : 127.0.3.0]; DKIM_TRACE(0.00)[ena.com:+]; TO_DN_EQ_ADDR_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FORGED_RECIPIENTS(2.00)[]; IP_SCORE(-0.20)[ip: (-8.54), ipnet: 216.32.180.0/22(-6.75), asn: 8075(-2.74), country: US(-2.91)]; KAM_ASCII_DIVIDERS(0.80)[__KAM_ASCII_DIVIDERS]; __THREADED(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RBL_HOSTKARMA_WHITE(-2.20)[180.180.32.216.hostkarma.junkemailfilter.com : 127.0.0.1]; NEURAL_HAM(-0.00)[-0.992,0]; __DOS_REF_NEXT_WK_DAY(0.00)[__DOS_RCVD_TUE,__DOS_BODY_WED]; FROM_EQ_ENVFROM(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:216.32.180.0/23]; R_DKIM_ALLOW(-0.20)[ena.com]; RCPT_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:8075, ipnet:216.32.180.0/22, country:US]; __freemail_safe(0.00)[__freemail_safe_fwd]; HAS_XOIP(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; ARC_NA(0.00)[]; MX_GOOD(-0.01)[cached: smtp2.ena.net]; RBL_HOSTKARMA_YELLOW(2.20)[180.180.32.216.hostkarma.junkemailfilter.com : 127.0.0.3]; FROM_HAS_DN(0.00)[]; __DOS_REF_TODAY(0.00)[__DOS_RCVD_TUE,__DOS_BODY_TUE]; __DOS_REF_2_WK_DAYS(0.00)[__DOS_RCVD_TUE,__DOS_BODY_THU] X-ENA-MailScanner-Information: Report abuse to abuse@ena.com and include the next header value X-ENA-MailScanner-ID: 208E31480BAF.AD73A X-ENA-MailScanner: No viruses found X-ENA-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-3.421, required 4, autolearn=not spam, BAYES_00 -3.20, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, DNSWL_DWL_NONE -0.10, ENA_FREEMAIL 0.20, ENA_FREEMAIL_HOSTKARMA_OFFSET 2.50, ENA_FREEMAIL_SENDERSCORE_90_100_OFFSET 2.20, ENA_MX_GOOD -0.01, HTML_MESSAGE 0.00, RCVD_IN_DNSWL_NONE -0.20, RCVD_IN_HOSTKARMA_W -2.50, RCVD_IN_SENDERSCORE_90_100 -2.20, SPF_HELO_PASS -0.00, SPF_PASS -0.00, T_DKIMWL_WL_MED -0.01) X-ENA-MailScanner-From: sweller@ena.com X-ENA-MailScanner-Watermark: 1525785635.08514@tFM4MSI7tEBySr9znS4gow --_000_BN6PR02MB3331076475080AB6DA8AE6D3A9810BN6PR02MB3331namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Yes, nice work! ________________________________ From: Daan Hoogland Sent: Tuesday, May 1, 2018 5:28 AM To: users@cloudstack.apache.org Cc: dev Subject: Re: [DISCUSS] VR upgrade downtime reduction good work Rohit, I'll review 2508 https://github.com/apache/cloudstack/pull/2508 On Tue, May 1, 2018 at 12:08 PM, Rohit Yadav wrote: > All, > > > A short-term solution to VR upgrade or network restart (with cleanup=3Dtr= ue) > has been implemented: > > > - The strategy for redundant VRs builds on top of Wei's original patch > where backup routers are removed and replace in a rolling basis. The > downtime I saw was usually 0-2 seconds, and theoretically downtime is > maximum of [0, 3*advertisement interval + skew seconds] or 0-10 seconds > (with cloudstack's default of 1s advertisement interval). > > > - For non-redundant routers, I've implemented a strategy where first a ne= w > VR is deployed, then old VR is powered-off/destroyed, and the new VR is > again re-programmed. With this strategy, two identical VRs may be up for = a > brief moment (few seconds) where both can serve traffic, however the new = VR > performs arp-ping on its interfaces to update neighbours. After the old V= R > is removed, the new VR is re-programmed which among many things performs > another arpping. The theoretical downtime is therefore limited by the > arp-cache refresh which can be up to 30 seconds. In my experiments, again= st > various VMware, KVM and XenServer versions I found that the downtime was > indeed less than 30s, usually between 5-20 seconds. Compared to older ACS > versions, especially in cases where VRs deployment require full volume co= py > (like in VMware) a 10x-12x improvement was seen. > > > Please review, test the following PRs which has test details, benchmarks, > and some screenshots: > > https://github.com/apache/cloudstack/pull/2508 > > > Future work can be driven towards making all VRs redundant enabled by > default that can allow for a firewall+connections state transfer > (conntrackd + VRRP2/3 based) during rolling reboots. > > > - Rohit > > > > > > ________________________________ > From: Daan Hoogland > Sent: Thursday, February 8, 2018 3:11:51 PM > To: dev > Subject: Re: [DISCUSS] VR upgrade downtime reduction > > to stop the vote and continue the discussion. I personally want unificati= on > of all router vms: VR, 'shared network', rVR, VPC, rVPC, and eventually t= he > one we want to create for 'enterprise topology hand-off points'. And I > think we have some level of consensus on that but the path there is a > concern for Wido and for some of my colleagues as well, and rightly so. O= ne > issue is upgrades from older versions. > > I the common scenario as follows: > + redundancy is deprecated and only number of instances remain. > + an old VR is replicated in memory by an redundant enabled version, that > will be in a state of running but inactive. > - the old one will be destroyed while a ping is running > - as soon as the ping fails more then three times in a row (this might ha= ve > to have a hypervisor specific implementation or require a helper vm) > + the new one is activated > > after this upgrade Wei's and/or Remi's code will do the work for any > following upgrade. > > flames, please > > > > On Wed, Feb 7, 2018 at 12:17 PM, Nux! wrote: > > > +1 too > > > > -- > > Sent from the Delta quadrant using Borg technology! > > > > Nux! > > www.nux.ro > > > > > rohit.yadav@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > ----- Original Message ----- > > > From: "Rene Moser" > > > To: "dev" > > > Sent: Wednesday, 7 February, 2018 10:11:45 > > > Subject: Re: [DISCUSS] VR upgrade downtime reduction > > > > > On 02/06/2018 02:47 PM, Remi Bergsma wrote: > > >> Hi Daan, > > >> > > >> In my opinion the biggest issue is the fact that there are a lot of > > different > > >> code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc. That'= s > > why you > > >> cannot simply switch from a single VPC to a redundant VPC for exampl= e. > > >> > > >> For SBP, we mitigated that in Cosmic by converting all non-VPCs to a > > VPC with a > > >> single tier and made sure all features are supported. Next we merged > > the single > > >> and redundant VPC code paths. The idea here is that redundancy or no= t > > should > > >> only be a difference in the number of routers. Code should be the > same. > > A > > >> single router, is also "master" but there just is no "backup". > > >> > > >> That simplifies things A LOT, as keepalived is now the master of the > > whole > > >> thing. No more assigning ip addresses in Python, but leave that to > > keepalived > > >> instead. Lots of code deleted. Easier to maintain, way more stable. = We > > just > > >> released Cosmic 6 that has this feature and are now rolling it out i= n > > >> production. Looking good so far. This change unlocks a lot of > > possibilities, > > >> like live upgrading from a single VPC to a redundant one (and back). > In > > the > > >> end, if the redundant VPC is rock solid, you most likely don't even > > want single > > >> VPCs any more. But that will come. > > >> > > >> As I said, we're rolling this out as we speak. In a few weeks when > > everything is > > >> upgraded I can share what we learned and how well it works. CloudSta= ck > > could > > >> use a similar approach. > > > > > > +1 Pretty much this. > > > > > > Ren=E9 > > > > > > -- > Daan > -- Daan --_000_BN6PR02MB3331076475080AB6DA8AE6D3A9810BN6PR02MB3331namp_--