cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wido den Hollander <w...@widodh.nl>
Subject Re: VXLAN and KVm experiences
Date Wed, 02 Jan 2019 09:58:01 GMT
Hi,

On 12/28/18 5:43 PM, Ivan Kudryavtsev wrote:
> Wido, that's interesting. 
> 
> Do you think that the Cumulus-based switches with BGP inside have
> advantage over classic OSPF-based routing switches and separate multihop
> MP BGP route-servers for VNI propagation? 
> 

I don't know. We do not use OSFP anywhere in our netwerk. We are a
(i)BGP network only.

We want to use as much Open Software as possible. Buy switches we like
and then add ONIE based Software like Cumulus.

> I'm thinking about pure L3 OSPF-based backend networks for management
> and storage where cloudstack uses bridges on dummy interfaces with IP
> assigned while real NICS use utility IP-addresses in several OSPF
> networks and all those target IPs are distributed with OSPF. 
> 
> Next, VNI-s are created over bridges and their information is
> distributed over BGP. 
> 
> This approach helps to implement fault tolerance and multi-path routes
> with standard L3 stack without xSTP, VCS, etc, decrease broadcast domains.
> 
> Any thoughts?
> 

I wouldn't know for sure, we haven't looked into this yet.

Again, our plan, but not set in stone is:

- Unnumbered BGP (IPv6 Link Local) to all Hypervisors
- Link balancing using ECMP
- BGP+EVPN for VXLAN VNI distribution
- Use a static VNI for CloudStack POD IPv4
- Adapt the *modifyvxlan.sh* script to suit our needs

This way the transport of traffic will be all be done in a IPv6 only
fashion.

IPv4 to the hypervisors (POD Traffic and NFS SS) is all done by a VXLAN
device we create manually on them.

Wido

> 
> пт, 28 дек. 2018 г. в 05:34, Wido den Hollander <wido@widodh.nl
> <mailto:wido@widodh.nl>>:
> 
> 
> 
>     On 10/23/18 2:54 PM, Ivan Kudryavtsev wrote:
>     > Doesn't solution like this works seamlessly for large VXLAN networks?
>     >
>     > https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn
>     >
> 
>     This is what we are looking into right now.
> 
>     As CloudStack executes *modifyvxlan.sh* prior to starting an Instance it
>     would be just a matter of replacing this script with a version which
>     does the EVPN for us.
> 
>     Our routers will probably be 36x100G SuperMicro Bare Matel switches
>     running Cumulus.
> 
>     Using unnumbered BGP over IPv6 we'll provide network connectivity to the
>     Hypervisors.
> 
>     Using FFR and EVPN we'll be able to enable VXLAN on the hypervisors and
>     route traffic.
> 
>     As these things seem to be very use-case specific I don't see how we can
>     integrate this into CloudStack in a generic way.
> 
>     The *modifyvxlan.sh* script gets the VNI as a argument, so anybody can
>     adapt it to their own needs for their specific environment.
> 
>     Wido
> 
>     > вт, 23 окт. 2018 г., 8:34 Simon Weller <sweller@ena.com.invalid>:
>     >
>     >> Linux native VXLAN uses multicast and each host has to participate in
>     >> multicast in order to see the VXLAN networks. We haven't tried
>     using PIM
>     >> across a L3 boundary with ACS, although it will probably work fine.
>     >>
>     >> Another option is to use a L3 VTEP, but right now there is no native
>     >> support for that in CloudStack's VXLAN implementation, although we've
>     >> thought about proposing it as feature.
>     >>
>     >>
>     >> ________________________________
>     >> From: Wido den Hollander <wido@widodh.nl <mailto:wido@widodh.nl>>
>     >> Sent: Tuesday, October 23, 2018 7:17 AM
>     >> To: dev@cloudstack.apache.org <mailto:dev@cloudstack.apache.org>;
>     Simon Weller
>     >> Subject: Re: VXLAN and KVm experiences
>     >>
>     >>
>     >>
>     >> On 10/23/18 1:51 PM, Simon Weller wrote:
>     >>> We've also been using VXLAN on KVM for all of our isolated VPC guest
>     >> networks for quite a long time now. As Andrija pointed out, make
>     sure you
>     >> increase the max_igmp_memberships param and also put an ip
>     address on each
>     >> interface host VXLAN interface in the same subnet for all hosts
>     that will
>     >> share networking, or multicast won't work.
>     >>>
>     >>
>     >> Thanks! So you are saying that all hypervisors need to be in the
>     same L2
>     >> network or are you routing the multicast?
>     >>
>     >> My idea was that each POD would be an isolated Layer 3 domain and
>     that a
>     >> VNI would span over the different Layer 3 networks.
>     >>
>     >> I don't like STP and other Layer 2 loop-prevention systems.
>     >>
>     >> Wido
>     >>
>     >>>
>     >>> - Si
>     >>>
>     >>>
>     >>> ________________________________
>     >>> From: Wido den Hollander <wido@widodh.nl <mailto:wido@widodh.nl>>
>     >>> Sent: Tuesday, October 23, 2018 5:21 AM
>     >>> To: dev@cloudstack.apache.org <mailto:dev@cloudstack.apache.org>
>     >>> Subject: Re: VXLAN and KVm experiences
>     >>>
>     >>>
>     >>>
>     >>> On 10/23/18 11:21 AM, Andrija Panic wrote:
>     >>>> Hi Wido,
>     >>>>
>     >>>> I have "pioneered" this one in production for last 3 years (and
>     >> suffered a
>     >>>> nasty pain of silent drop of packages on kernel 3.X back in the
>     days
>     >>>> because of being unaware of max_igmp_memberships kernel
>     parameters, so I
>     >>>> have updated the manual long time ago).
>     >>>>
>     >>>> I never had any issues (beside above nasty one...) and it works
>     very
>     >> well.
>     >>>
>     >>> That's what I want to hear!
>     >>>
>     >>>> To avoid above issue that I described - you should increase
>     >>>> max_igmp_memberships (/proc/sys/net/ipv4/igmp_max_memberships) 
-
>     >> otherwise
>     >>>> with more than 20 vxlan interfaces, some of them will stay in
>     down state
>     >>>> and have a hard traffic drop (with proper message in agent.log)
>     with
>     >> kernel
>     >>>>> 4.0 (or I silent, bitchy random packet drop on kernel 3.X...)
>     - and
>     >> also
>     >>>> pay attention to MTU size as well - anyway everything is in the
>     manual
>     >> (I
>     >>>> updated everything I though was missing) - so please check it.
>     >>>>
>     >>>
>     >>> Yes, the underlying network will all be 9000 bytes MTU.
>     >>>
>     >>>> Our example setup:
>     >>>>
>     >>>> We have i.e. bond.950 as the main VLAN which will carry all vxlan
>     >> "tunnels"
>     >>>> - so this is defined as KVM traffic label. In our case it
>     didn't make
>     >> sense
>     >>>> to use bridge on top of this bond0.950 (as the traffic label) -
>     you can
>     >>>> test it on your own - since this bridge is used only to extract
>     child
>     >>>> bond0.950 interface name, then based on vxlan ID, ACS will
>     provision
>     >>>> vxlanYYY@bond0.xxx and join this new vxlan interface to NEW bridge
>     >> created
>     >>>> (and then of course vNIC goes to this new bridge), so original
>     bridge
>     >> (to
>     >>>> which bond0.xxx belonged) is not used for anything.
>     >>>>
>     >>>
>     >>> Clear, I indeed thought something like that would happen.
>     >>>
>     >>>> Here is sample from above for vxlan 867 used for tenant isolation:
>     >>>>
>     >>>> root@hostname:~# brctl show brvx-867
>     >>>>
>     >>>> bridge name     bridge id               STP enabled 
   interfaces
>     >>>> brvx-867                8000.2215cfce99ce       no 
           
>     vnet6
>     >>>>
>     >>>>      vxlan867
>     >>>>
>     >>>> root@hostname:~# ip -d link show vxlan867
>     >>>>
>     >>>> 297: vxlan867: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8142
qdisc
>     noqueue
>     >>>> master brvx-867 state UNKNOWN mode DEFAULT group default qlen 1000
>     >>>>     link/ether 22:15:cf:ce:99:ce brd ff:ff:ff:ff:ff:ff
>     promiscuity 1
>     >>>>     vxlan id 867 group 239.0.3.99 dev bond0.950 port 0 0 ttl
10
>     ageing
>     >> 300
>     >>>>
>     >>>> root@ix1-c7-2:~# ifconfig bond0.950 | grep MTU
>     >>>>           UP BROADCAST RUNNING MULTICAST  MTU:8192  Metric:1
>     >>>>
>     >>>> So note how the vxlan interface has by 50 bytes smaller MTU
>     than the
>     >>>> bond0.950 parent interface (which could affects traffic inside
>     VM) - so
>     >>>> jumbo frames are needed anyway on the parent interface (bond.950
in
>     >> example
>     >>>> above with minimum of 1550 MTU)
>     >>>>
>     >>>
>     >>> Yes, thanks! We will be using 1500 MTU inside the VMs, so all the
>     >>> networks underneath will be ~9k.
>     >>>
>     >>>> Ping me if more details needed, happy to help.
>     >>>>
>     >>>
>     >>> Awesome! We'll be doing a PoC rather soon. I'll come back with our
>     >>> experiences later.
>     >>>
>     >>> Wido
>     >>>
>     >>>> Cheers
>     >>>> Andrija
>     >>>>
>     >>>> On Tue, 23 Oct 2018 at 08:23, Wido den Hollander
>     <wido@widodh.nl <mailto:wido@widodh.nl>>
>     >> wrote:
>     >>>>
>     >>>>> Hi,
>     >>>>>
>     >>>>> I just wanted to know if there are people out there using KVM
with
>     >>>>> Advanced Networking and using VXLAN for different networks.
>     >>>>>
>     >>>>> Our main goal would be to spawn a VM and based on the network
>     the NIC
>     >> is
>     >>>>> in attach it to a different VXLAN bridge on the KVM host.
>     >>>>>
>     >>>>> It seems to me that this should work, but I just wanted to
>     check and
>     >> see
>     >>>>> if people have experience with it.
>     >>>>>
>     >>>>> Wido
>     >>>>>
>     >>>>
>     >>>>
>     >>>
>     >>
>     >
> 
> 
> 
> -- 
> With best regards, Ivan Kudryavtsev
> Bitworks LLC
> Cell RU: +7-923-414-1515
> Cell USA: +1-201-257-1512
> WWW: http://bitworks.software/ <http://bw-sw.com/>
> 

Mime
View raw message