cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dag Sonstebo <Dag.Sonst...@shapeblue.com>
Subject Re: Help/Advice needed - some traffic don't reach VNET / VM
Date Mon, 09 Oct 2017 21:08:25 GMT
Hi Andrija,

Do you use NIC bonds? I have seen this before when using active-active bonds, and as you say
it can be very difficult to troubleshoot and the behaviour makes little sense. What can happen
is network traffic is load balanced between the two NICs, however the update frequency of
the MAC tables between the two switches don’t keep up with the load balanced traffic. In
other words a MAC address which used to transmit on hypervisor eth0 (attached to your first
top of rack switch) of a bond has suddenly due to load started transmitting on eth1 (attached
to the second of the top of rack switches) of the bond, however the physical switch stack
still thinks the MAC address lives on eth0, hence traffic is dropped until next time the switches
synch MAC tables. 

We used to see this a lot in the past on XenServer – the solution being moving to active-passive
bond modes, or go up to LACP/802.3ad if your hardware allows for it. The same principle will
however also apply on generic linux bonds.

Regards, 
Dag Sonstebo
Cloud Architect
ShapeBlue
 S: +44 20 3603 0540  | dag.sonstebo@shapeblue.com | http://www.shapeblue.com <http://www.shapeblue.com/>
| Twitter:@ShapeBlue <https://twitter.com/#!/shapeblue>


On 09/10/2017, 21:52, "Andrija Panic" <andrija.panic@gmail.com> wrote:

    Hi guys,
    
    we have occasional but serious problem, that starts happening as it seems
    randomly (i.e. NOT under high load)  - not ACS related afaik, purely KVM,
    but feedback is really welcomed.
    
    - VM is reachable in general from everywhere, but not reachable from
    specific IP address ?!
    - VM is NOT under high load, network traffic next to zero, same for
    CPU/disk...
    - We mitigate this problem by migrating VM away to another host, not much
    of a solution...
    
    Description of problem:
    
    We let ping from "problematic" source IP address to the problematic VM, and
    we capture traffic on KVM host where the problematic VM lives:
    
    - Tcpdump on VXLAN interface (physical incoming interface on the host) - we
    see packet fine
    - tcpdump on BRIDGE = we see packet fine
    - tcpdump on VNET = we DON'T see packet.
    
    In the scenario above, I need to say that :
    - we can tcpdump packets from other source IPs on the VNET interface just
    fine (as expected), so should also see this problematic source IP's packets
    - we can actually ping in oposite direction - from the problematic VM to
    the problematic "source" IP
    
    We checked everything possible, from bridge port forwarding, to mac-to-vtep
    mapping, to many other things, removed traffic shaping from VNET interface,
    no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to
    bridge, destroy bridge and create manually on the fly,
    
    Problem is really crazy, and I can not explain it - no iptables, no
    ebtables for troubleshooting pruposes (on this host) and
    
    We mitigate this problem by migrating VM away to another host, not much of
    a solution...
    
    This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1),
    Stock kernel 3.16-xx, regular bridge (not OVS)
    
    Anyone else ever heard of such problem - this is not intermittent packet
    dropping, but complete blackout/packet drop in some way...
    
    Thanks,
    
    -- 
    
    Andrija Panić
    


Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

Mime
View raw message