cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Yadav <>
Subject Re: VXLAN and KVm experiences
Date Mon, 19 Nov 2018 13:36:39 GMT

I need some pointers around vxlan debugging and configuration: (sorry for the long email)

I'm working on a concept CI system where the idea is to setup CloudStack with kvm hosts and
use vxlan isolation for guest, mgmt  and public networks, and then run CI jobs as CloudStack
projects where monkeybox VMs (nested kvm VMs) run in isolated networks and are used to test
a CloudStack build/branch/PR.

I've two Ubuntu 18.04.1 based i7 mini pcs running KVM, where there is a single bridge/nic
cloudbr0 to carry public, guest and mgmt network that is vxlan based. I've set max_igmp_memberships
to 200 and to see console proxy etc I used vxlan://untagged for the public IP address range.
The gigabit switch between them does not support igmp snooping. Now the problem is that in
the nested VMs in an isolated network (VRs public nic plugs into cloudbr0, and guest nic plugs
into a bridge that has vxlan end point for some VNI) , the download speed from public network
is very slow. I've enabled the default udp port for vxlan on both hosts. How do I debug vxlans,
what's going wrong? (do note that I've a single bridge for all those networks, with no vlans)

Rohit Yadav

From: Simon Weller <>
Sent: Wednesday, November 14, 2018 10:55:18 PM
To: Wido den Hollander;
Subject: Re: VXLAN and KVm experiences


Here is the original document on the implemention for VXLAN in ACS -

It may shed some light on the reasons for the different multicast groups.

- Si

From: Wido den Hollander <>
Sent: Tuesday, November 13, 2018 4:40 AM
To:; Simon Weller
Subject: Re: VXLAN and KVm experiences

On 10/23/18 2:34 PM, Simon Weller wrote:
> Linux native VXLAN uses multicast and each host has to participate in multicast in order
to see the VXLAN networks. We haven't tried using PIM across a L3 boundary with ACS, although
it will probably work fine.
> Another option is to use a L3 VTEP, but right now there is no native support for that
in CloudStack's VXLAN implementation, although we've thought about proposing it as feature.

Getting back to this I see CloudStack does this:

local mcastGrp="239.$(( ($vxlanId >> 16) % 256 )).$(( ($vxlanId >> 8) %
256 )).$(( $vxlanId % 256 ))"

VNI 1000 would use group and VNI 1001 uses 1000.

Why are we using a different mcast group for every VNI? As the VNI is
encoded in the packet this should just work in one group, right?

Because this way you need to configure all those groups on your
Router(s) as each VNI will use a different Multicast Group.

I'm just looking for the reason why we have this different multicast groups.

I was thinking that we might want to add a option to
where we allow users to set a fixed Multicast group for all traffic.



> ________________________________
> From: Wido den Hollander <>
> Sent: Tuesday, October 23, 2018 7:17 AM
> To:; Simon Weller
> Subject: Re: VXLAN and KVm experiences
> On 10/23/18 1:51 PM, Simon Weller wrote:
>> We've also been using VXLAN on KVM for all of our isolated VPC guest networks for
quite a long time now. As Andrija pointed out, make sure you increase the max_igmp_memberships
param and also put an ip address on each interface host VXLAN interface in the same subnet
for all hosts that will share networking, or multicast won't work.
> Thanks! So you are saying that all hypervisors need to be in the same L2
> network or are you routing the multicast?
> My idea was that each POD would be an isolated Layer 3 domain and that a
> VNI would span over the different Layer 3 networks.
> I don't like STP and other Layer 2 loop-prevention systems.
> Wido
>> - Si
>> ________________________________
>> From: Wido den Hollander <>
>> Sent: Tuesday, October 23, 2018 5:21 AM
>> To:
>> Subject: Re: VXLAN and KVm experiences
>> On 10/23/18 11:21 AM, Andrija Panic wrote:
>>> Hi Wido,
>>> I have "pioneered" this one in production for last 3 years (and suffered a
>>> nasty pain of silent drop of packages on kernel 3.X back in the days
>>> because of being unaware of max_igmp_memberships kernel parameters, so I
>>> have updated the manual long time ago).
>>> I never had any issues (beside above nasty one...) and it works very well.
>> That's what I want to hear!
>>> To avoid above issue that I described - you should increase
>>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships)  - otherwise
>>> with more than 20 vxlan interfaces, some of them will stay in down state
>>> and have a hard traffic drop (with proper message in agent.log) with kernel
>>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...) - and also
>>> pay attention to MTU size as well - anyway everything is in the manual (I
>>> updated everything I though was missing) - so please check it.
>> Yes, the underlying network will all be 9000 bytes MTU.
>>> Our example setup:
>>> We have i.e. bond.950 as the main VLAN which will carry all vxlan "tunnels"
>>> - so this is defined as KVM traffic label. In our case it didn't make sense
>>> to use bridge on top of this bond0.950 (as the traffic label) - you can
>>> test it on your own - since this bridge is used only to extract child
>>> bond0.950 interface name, then based on vxlan ID, ACS will provision
>>> and join this new vxlan interface to NEW bridge created
>>> (and then of course vNIC goes to this new bridge), so original bridge (to
>>> which belonged) is not used for anything.
>> Clear, I indeed thought something like that would happen.
>>> Here is sample from above for vxlan 867 used for tenant isolation:
>>> root@hostname:~# brctl show brvx-867
>>> bridge name     bridge id               STP enabled     interfaces
>>> brvx-867                8000.2215cfce99ce       no              vnet6
>>>      vxlan867
>>> root@hostname:~# ip -d link show vxlan867
>>> 297: vxlan867: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8142 qdisc noqueue
>>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
>>>     link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff promiscuity 1
>>>     vxlan id 867 group dev bond0.950 port 0 0 ttl 10 ageing 300
>>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>>>           UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>>> So note how the vxlan interface has by 50 bytes smaller MTU than the
>>> bond0.950 parent interface (which could affects traffic inside VM) - so
>>> jumbo frames are needed anyway on the parent interface (bond.950 in example
>>> above with minimum of 1550 MTU)
>> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
>> networks underneath will be ~9k.
>>> Ping me if more details needed, happy to help.
>> Awesome! We'll be doing a PoC rather soon. I'll come back with our
>> experiences later.
>> Wido
>>> Cheers
>>> Andrija
Amadeus House, Floral Street, London  WC2E 9DPUK

> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander <> wrote:
>>>> Hi,
>>>> I just wanted to know if there are people out there using KVM with
>>>> Advanced Networking and using VXLAN for different networks.
>>>> Our main goal would be to spawn a VM and based on the network the NIC is
>>>> in attach it to a different VXLAN bridge on the KVM host.
>>>> It seems to me that this should work, but I just wanted to check and see
>>>> if people have experience with it.
>>>> Wido

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message