Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE47B1774A for ; Wed, 1 Apr 2015 16:36:20 +0000 (UTC) Received: (qmail 19275 invoked by uid 500); 1 Apr 2015 16:36:20 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 19229 invoked by uid 500); 1 Apr 2015 16:36:20 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 19218 invoked by uid 99); 1 Apr 2015 16:36:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2015 16:36:20 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: encountered temporary error during SPF processing of domain of lbarfield@tqhosting.com) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2015 16:36:15 +0000 Received: by widdi4 with SMTP id di4so51304078wid.0 for ; Wed, 01 Apr 2015 09:34:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=Wclp//QPBfVkvrub5XYa0Ilu0uLvHCAsmIrhkJtiSKY=; b=k7uDo7Ysn9MXtl6PZ8KQSF07zTIttaN0tDFMUR3Wo8Qv9Fl5e9HumQd/FKwBdGyJIG Mf4qNa35daRUjBkl0M3NxBo4CoVngMfBKFpCUK6AxxziGN48MBs//WTgj+WQYuYkCpRW tAyOW3w5N40TJ+r12tTn8gybDyXvL12aiItk9vXognh4i3ukG28DV7Kf91AD6Hdp/NU2 oDeyKL9dohlAM+YplImJRL2qr01Mu/sRzYr3GIih5UKp7zkLOOATZ9z4MMf0P9dvxmB+ Yz0B9XgXf92W1yaesW0GH5inNfcCZ04Wti8j7MMm7d/6WHv11WtrM0fDYQuNXiDgmN85 WXMg== X-Gm-Message-State: ALoCoQn/3jBhLg9SLLKvSvIQSfduqHNXTucF1NuUQTHOpEFaFtmJncKZdt5uHI+dqAFgF6+olnv5 MIME-Version: 1.0 X-Received: by 10.180.95.102 with SMTP id dj6mr16732510wib.45.1427906044470; Wed, 01 Apr 2015 09:34:04 -0700 (PDT) Received: by 10.180.90.102 with HTTP; Wed, 1 Apr 2015 09:34:04 -0700 (PDT) Date: Wed, 1 Apr 2015 12:34:04 -0400 Message-ID: Subject: Load Balancer (HAProxy) - TPROXY passthrough From: Logan Barfield To: dev@cloudstack.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org We've been running into some issues with the Advanced Zone/Isolated Network Load Balancer, and in working through them we've come up with some ideas for how the functionality can be improved. The first issue we hit was with HTTP load balancing. We had a site that was sending back larger than average HTTP headers with 302 redirects. This was causing HAProxy to return 502 errors to clients. This is apparently a known issue with HAProxy when using the default "tune.bufsize" and "tune.maxrewrite" settings.The official HAProxy documentation recommends changing these from the defaults. We were able to work around the problem by manually setting "tune.maxrewrite 1024" in the haproxy.cfg on the virtual router. This resolved most of the 502 errors, and would have probably resolved all of them with more tuning. The problem is that this change obviously wouldn't survive upgrades or VR rebuilds. To fix the problem on a more permanent basis we changed the KeepAliveEnabled Network Offering setting introduced a few versions ago. This directs HAProxy to use TCP mode instead of HTTP mode for rules configured on port 80. This solution works for the most part, but there are a couple of problems: 1) There doesn't appear to be support for this setting in the UI. That's understandable as the UI is way behind the current feature set. 2) There doesn't appear to be support for this setting in the API, either when creating or updating Network Offerings. This is a bit of a problem. We had to make the change in the database directly, which is very dirty. 3) TCP connections don't grab HTTP headers, so they can't send the real IP to HTTP/Nginx in the X-Forwarded-For header. To fix these issues I suggest the following changes be made: 1) Add the "KeepAliveEnabled" option to the Network Offering API commands. I really have no idea how to do this, so I'll try to flag the original committer to see if they can do so. 2) Add a new option for TPROXY support. The current VR kernel and HAProxy version have TPROXY support built in, so having the option (on a per LB rule basis) would be great. This would allow for using TCP mode in HAProxy, while still passing the real IP through to the backend services. To accomplish this I would suggest adding the necessary IPtables rules to the VR either by default, or when Load Balancing is first enabled. Then a flag can be added to the create LB rule command to either enable or disable the transparent proxy setting. The necessary IPtables rules are: iptables -t mangle -N DIVERT iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT iptables -t mangle -A DIVERT -j MARK --set-mark 111 iptables -t mangle -A DIVERT -j ACCEPT ip rule add fwmark 111 lookup 100 ip route add local 0.0.0.0/0 dev lo table 100 Then for each LB rule created with the "transparent" option enabled, add the following setting to the rule configuration: source 0.0.0.0 usesrc clientip The last configuration change is to remove the user/group or uid/gid options in the haproxy config, otherwise HAProxy won't start with "usesrc" enabled. It is also recommended to enable /proc/sys/net/ipv4/conf/eth0/send_redirects, but I haven't noticed any issues with it disabled either, and I don't know what else it might affect. 3) The last suggestion would be to move away from hard coding configuration directives for VR services (like HAProxy: https://github.com/apache/cloudstack/blob/5091d0f5c5b03cb8658f2d974103261341080825/core/src/com/cloud/network/HAProxyConfigurator.java) Doing this makes implementing changes a hassle since it involves rebuilding/upgrading CloudStack to accomplish anything, even small edits. For a production environment this is ill advised if not impossible. In general it would make sense to make persistent changes to VR services possible without recompiling code or rebooting the VRs. I believe that's part of a bigger issue though, as I've seen some discussion about it on the list. If anyone actually makes it through this, I'd appreciate any feedback on things I may not be considering, or reasons not to implement these changes. I doubt I'll get enough traction for an actual developer to help, so I'll probably end up hacking these in myself and committing them. I just wanted to see what the community thought first. Thank You, Logan Barfield Tranquil Hosting