incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiradeep Vittal <Chiradeep.Vit...@citrix.com>
Subject Re: [RFC] QinQ vlans support
Date Mon, 22 Oct 2012 05:41:25 GMT
+1 on the FS.

On 10/20/12 10:52 PM, "Marcus Sorensen" <shadowsor@gmail.com> wrote:

>The admin does have to create a new physical network, the patch just
>allows you to use a tagged network as that physical network rather
>than a real eth device. It is true that cloudstack doesn't know about
>q-in-q per se, but it is the one creating the q-in-q vlans. The admin
>does have to create any "vlan#" devs to be used, but I think that
>makes sense since cloudstack doesn't manage any of your physical
>network devices. Perhaps I need to write a bit of a functional spec
>just to describe it in more detail.
>
>I haven't done anything with it in regards to xen, of course that
>would also be a different patch since it hits different code. If
>someone knows that code well maybe they can help. This is a simple
>patch, but it's made possible by a previous patch that reworks how the
>bridges are named, so enabling it for xen might not be as simple as
>this makes it look.
>
>On Sat, Oct 20, 2012 at 10:57 PM, Chiradeep Vittal
><Chiradeep.Vittal@citrix.com> wrote:
>> It looks like your patch does not require the admin to configure
>>anything
>> wrt
>> physical networks. The admin knows the list of "outer" VLANs and
>> CloudStack is
>> blissfully unaware of the QinQ stuff.
>> This requires the hypervisors to be independently configured
>>(out-of-band)
>> with the
>> outer VLAN bridges ?
>> It also looks like this is a KVM-only solution.
>> Have you tried this with XS?
>>
>> On 10/18/12 6:21 PM, "Marcus Sorensen" <shadowsor@gmail.com> wrote:
>>
>>>Ah, well it's pretty simple, so I'll just paste it here. Again,
>>>perhaps more should be implemented regarding the MTU (like
>>>functionality to configure MTU on the virtual router), but if you know
>>>what to do it can all work via switch configs.
>>>
>>>diff --git
>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>index 1bc70fa..70de3db 100755
>>>---
>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>+++
>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC
>>>om
>>>putingResource.java
>>>@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends
>>>ServerResourceBase implements
>>>         String pif = Script.runSimpleBashScript("brctl show | grep "
>>>+ bridge + " | awk '{print $4}'");
>>>         String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" +
>>>pif);
>>>
>>>-        if (vlan != null && !vlan.isEmpty()) {
>>>+        if (vlan != null && !vlan.isEmpty() &&
>>>(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) {
>>>                 pif = Script.runSimpleBashScript("grep ^Device\\:
>>>/proc/net/vlan/" + pif + " | awk {'print $2'}");
>>>         }
>>>
>>>On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers
>>><chip.childers@sungard.com> wrote:
>>>> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen
>>>><shadowsor@gmail.com>
>>>>wrote:
>>>>> Sorry, I've been up to my ears. I've attached the simple patch that
>>>>> makes this all happen, if anyone wants to take a look. This is the
>>>>> code that looks for physical devices. It's passed a bridge and then
>>>>> determines the parent of that bridge, then whether that parent is a
>>>>> tagged device and goes one more step and finds its parent. This just
>>>>> circumvents the last lookup if the parent of the bridge is a "vlan"
>>>>> device (single tagged, e.g. vlan100) but not a double-tagged one
>>>>>(e.g.
>>>>> vlan100.10), and the rest of cloudstack treats vlan100 as though it
>>>>> were a physical device, creates tagged bridges on it if it has guest
>>>>> traffic type, etc. I've been using it in our test bed for about a
>>>>> month, and have only run into the MTU issue.
>>>>
>>>> Hey Marcus,
>>>>
>>>> Attachments get stripped.  Can you post it somewhere?
>>>>
>>>>> If people still think it's a good idea, I'll create a functional spec
>>>>> and additional info on how it works.
>>>>>
>>>>>  I've also got a small patch to modifyvlans.sh, but I'm debating
>>>>> whether or not it's necessary. It detects whether the "physical
>>>>> interface" is actually a vlan tagged interface, and if so it
>>>>>subtracts
>>>>> the necessary bytes from the MTU when it sets up the double-tagged
>>>>> bridges. It's technically not necessary, as the important part is
>>>>> whether the guest MTUs fit inside the MTU that the switch allows once
>>>>> the extra tag is added. But it just makes it a bit more obvious as to
>>>>> what's needed. However it also breaks the admin's ability to bump the
>>>>> switch MTUs up just a bit, say 1532, to account for the excess
>>>>>without
>>>>> having to go up to 9000 or full jumbo. If anyone is a network guru
>>>>>and
>>>>> has any feedback it would be appreciated, but I'm inclined to leave
>>>>> the MTUs alone and write it into the functional spec that a switch
>>>>> with a 1500 MTU supports double tags up to 1468, and a switch with a
>>>>> 9000 MTU supports VM guest networks up to 8968 MTU.
>>>>>
>>>>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen
>>>>><shadowsor@gmail.com>
>>>>>wrote:
>>>>>> Ok, I'll pull out the changes and let people see them. Cloudstack
>>>>>> seems to let me put the same vlan ranges on multiple physicals,
>>>>>>though
>>>>>> I haven't done much actual testing with large numbers of vlans. I
>>>>>> imagine there would be other bottlenecks if they all needed to be
up
>>>>>> on the same host at once. Luckily we only create bridges for the
>>>>>> actual VMs on the box so it should scale reasonably.
>>>>>>
>>>>>> The only caveat I've run into so far is that you either need to be
>>>>>> running jumbo frames on your switches, or turn down the MTU on the
>>>>>> guests a bit to accommodate the space taken by extra tag.  If you
>>>>>> wanted to run jumbo fames on the guests as well, you'd run into the
>>>>>> same situation and have to use slightly less than the 9000 (although
>>>>>> the virtual router would require a patch too for the new size).
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina
>>>>>><Ahmad.Emneina@citrix.com> wrote:
>>>>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <kelcey@bbits.ca>
wrote:
>>>>>>>
>>>>>>>>That's a far more elegant way then I tried, which was creating
>>>>>>>>tagged
>>>>>>>>interfaces within guests.
>>>>>>>>
>>>>>>>>Sent from my iPhone
>>>>>>>>
>>>>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal
>>>>>>>><Chiradeep.Vittal@citrix.com> wrote:
>>>>>>>>
>>>>>>>>> This sounds like it can be modeled as multiple physical
networks?
>>>>>>>>>That
>>>>>>>>>is,
>>>>>>>>> each "outer" vlan (400, 401, etc) is a separate physical
network
>>>>>>>>>in the
>>>>>>>>> same zone. That could work, although it is probable that
the zone
>>>>>>>>> configuration API bits prevent more than 4k VLANs per
zone (that
>>>>>>>>>can be
>>>>>>>>> changed to per physical network).
>>>>>>>>>
>>>>>>>>> As long as communication between guests on different
physical
>>>>>>>>>networks
>>>>>>>>> happens via the public network, it should be Ok.
>>>>>>>>> I'd like to see the patch.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <shadowsor@gmail.com>
>>>>>>>>>wrote:
>>>>>>>>>
>>>>>>>>>> Guys, in looking for a free and scalable way to provide
private
>>>>>>>>>>networks
>>>>>>>>>> for customers I've been running a QinQ setup that
has been
>>>>>>>>>>working
>>>>>>>>>>quite
>>>>>>>>>> well. I've sort of laid the groundwork for it already
in
>>>>>>>>>>changing
>>>>>>>>>>the
>>>>>>>>>> bridge naming conventions about a month ago for KVM(to
names
>>>>>>>>>>that
>>>>>>>>>>won't
>>>>>>>>>> collide if the same vlans is used twice on different
phys).
>>>>>>>>>>
>>>>>>>>>> Basically the way it works is like this. Linux has
two ways of
>>>>>>>>>>creating
>>>>>>>>>> tagged networks, the eth#.# and the less used vlan#
network
>>>>>>>>>>devices. I
>>>>>>>>>> have
>>>>>>>>>> a tiny patch that causes cloudstack to treat vlan#
devs as
>>>>>>>>>>though
>>>>>>>>>>they
>>>>>>>>>> were
>>>>>>>>>> physical NICs. In this way, you can do something
like physical
>>>>>>>>>>devices
>>>>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's
bridge,
>>>>>>>>>>storage on
>>>>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest,
then create
>>>>>>>>>>say a
>>>>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic
type of guest
>>>>>>>>>>to it
>>>>>>>>>>and
>>>>>>>>>> give it a vlan range, say 10-4000. Then you end up
with
>>>>>>>>>>cloudstack
>>>>>>>>>>handing
>>>>>>>>>> out vlan400.10, vlan400.11, etc for guest networks.
Works great
>>>>>>>>>>for
>>>>>>>>>> network
>>>>>>>>>> isolation without burning through a bunch of your
"real" vlans.
>>>>>>>>>>In the
>>>>>>>>>> unlikely event that you run out, you just create
a physical
>>>>>>>>>>vlan401 and
>>>>>>>>>> start over with the vlan numbers.
>>>>>>>>>>
>>>>>>>>>> In theory all-you-can-eat isolated networks without
having to
>>>>>>>>>>configure
>>>>>>>>>> hundreds of vlans on your networking equipment. This
may require
>>>>>>>>>> additional
>>>>>>>>>> config on any upstream switches to pass the double
tags around,
>>>>>>>>>>but in
>>>>>>>>>> general from what I've seen the inner tags just pass
through on
>>>>>>>>>>anything
>>>>>>>>>> layer 2, it should only get tricky if you try to
tunnel, route
>>>>>>>>>>or
>>>>>>>>>>strip
>>>>>>>>>> tags.
>>>>>>>>>>
>>>>>>>>>> This is especially nice with system VM routers and
VPC
>>>>>>>>>>(cloudstack
>>>>>>>>>>takes
>>>>>>>>>> care of everything), but admittedly external routers
probably
>>>>>>>>>>will have
>>>>>>>>>> spotty support for being able to route double tagged
stuff. I'm
>>>>>>>>>>also a
>>>>>>>>>>bit
>>>>>>>>>> afraid that if I were to get it merged in that it
would just
>>>>>>>>>>become
>>>>>>>>>>this
>>>>>>>>>> undocumented hack thing that few know about and nobody
uses. So
>>>>>>>>>>I'm
>>>>>>>>>> looking
>>>>>>>>>> for feedback on whether this sounds useful enough
to commit, how
>>>>>>>>>>it
>>>>>>>>>>should
>>>>>>>>>> be documented, and whether it makes sense to hint
at this in the
>>>>>>>>>>GUI
>>>>>>>>>> somehow.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> This actually sounds amazing Marcus. I'd love to see and use
this
>>>>>>> implementation.
>>>>>>>
>>>>>>> --
>>>>>>> Æ
>>>>>>>
>>>>>>>
>>>>>>>
>>


Mime
View raw message