cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya musayev <ilya.mailing.li...@gmail.com>
Subject Re: intermittent packet loss after upgrading and restarting networks
Date Mon, 18 Aug 2014 20:09:51 GMT
Or perhaps this one.

http://cloudstack.apache.org/docs/api/apidocs-4.4/root_admin/restartNetwork.html

On 8/18/14, 1:07 PM, ilya musayev wrote:
> Nick,
>
> I dont believe we throttle disks unless you have a storage that has 
> direct integration to limits iops like solidfire or possibly netapp.
>
> The change is rather simple, in the global settings level - override 
> the throttle configs. They should generally be inherited from 
> upstream, if it did not - let me know and i can try to point you to a 
> db update  you can do.
>
> Once thats done, next time you do a deployment of a vm, it will check 
> the network portgroup it has created and update it. You can also try 
> doing stop and start of the VM, it may update the portgroup configs as 
> well (not 100% certain, but i think it will work). This behavior 
> definitely applies to vmware, i'd think the same would go for other 
> hypervisors like XEN and KVM - but I dont have XEN or KVM to try this on.
>
> One other suggestion, I would ask on dev list, there is an 
> updateNetwork api call that you could make - that presumable will 
> update these settings, the description for this call is rather brief, 
> hence devs would know better.
>
> http://cloudstack.apache.org/docs/api/apidocs-4.4/root_admin/updateNetwork.html 
>
>
> Regards
> ilya
>
> On 8/17/14, 4:38 PM, Nick Burke wrote:
>> Another update:
>>
>>
>> 100% confirmed to be traffic shapping set by CloudStack. I don't know
>> where/how/why, and I'd love some help with this. Should I create a new
>> thread? As previously mentioned, I don't believe I've set a cap of below
>> 100Mbs ANYWHERE in Cloudstack. Not in compute offerings, network 
>> offerings,
>> and not in the default throttle (which is set at 200).
>>
>> What am I missing?
>>
>> I removed tc rules on the host for two test instances and bandwidth 
>> shot up.
>>
>> Before:
>>
>> ubuntu@testserver01:~$ iperf -s
>> ------------------------------------------------------------
>> Server listening on TCP port 5001
>> TCP window size: 85.3 KByte (default)
>> ------------------------------------------------------------
>> [  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59276
>> [ ID] Interval       Transfer     Bandwidth
>> [  4]  0.0-10.4 sec  6.62 MBytes  5.35 Mbits/sec
>> [  5] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59277
>> [  5]  0.0-10.5 sec  6.62 MBytes  5.28 Mbits/sec
>> [  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59278
>> [  4]  0.0-10.4 sec  6.62 MBytes  5.37 Mbits/sec
>> [  5] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59291
>> [  5]  0.0-10.3 sec  6.62 MBytes  5.37 Mbits/sec
>> [  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59306
>> [  4]  0.0-10.5 sec  6.62 MBytes  5.30 Mbits/sec
>>
>> Removed the rules for two instances on the same host:
>>
>> ubuntu@dom02:~$ sudo tc qdisc del dev vnet1 root
>> ubuntu@dom02:~$ sudo tc qdisc del dev vnet3 root
>> ubuntu@dom02:~$ sudo tc qdisc del dev vnet3 ingress
>> ubuntu@dom02:~$ sudo tc qdisc del dev vnet1 ingress
>> ubuntu@dom02:~$ tc -s qdisc ls dev vnet1
>> qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
>> 1 1 1
>> 1 1 1 1
>>   Sent 7136572 bytes 1048 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>>
>> And all of a sudden, those two instances are at blazing speeds:
>>
>> ubuntu@testserver01:~$ iperf -s
>> ------------------------------------------------------------
>> Server listening on TCP port 5001
>> TCP window size: 85.3 KByte (default)
>> ------------------------------------------------------------
>> [  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59322
>> [ ID] Interval       Transfer     Bandwidth
>> [  4]  0.0-10.0 sec  14.8 GBytes  12.7 Gbits/sec
>> [  5] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59329
>> [  5]  0.0-10.0 sec  19.1 GBytes  16.4 Gbits/sec
>> [  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59330
>> [  4]  0.0-10.0 sec  19.0 GBytes  16.3 Gbits/sec
>>
>>
>>
>>
>>
>> On Sun, Aug 17, 2014 at 12:46 PM, Nick Burke <nick@nickburke.com> wrote:
>>
>>> First,
>>>
>>> THANK YOU FOR REPLYING!
>>>
>>> Second, yes, it's currently set at 200.
>>>
>>> The compute offering for network is either blank (or when I tested it,
>>> 1000)
>>> The network offering for network limit is either 100, 1000, or blank.
>>>
>>>
>>> Those are the only network throttling parameters that I'm aware of, are
>>> there any others that I missed? Is it possible disk i/o is for some 
>>> reason
>>> coming into play here?
>>>
>>> This happens regardless of if the instance network is either a virtual
>>> router or is directly connected to a vlan(ie, no virtual router) 
>>> when two
>>> instances are directly connected to each other.
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Aug 17, 2014 at 12:09 PM, ilya musayev <
>>> ilya.mailing.lists@gmail.com> wrote:
>>>
>>>> Nick
>>>>
>>>> Have you checked network throttle settings in "global setting" and 
>>>> where
>>>> ever else it may be defined?
>>>>
>>>> regads
>>>> ilya
>>>>
>>>> On 8/17/14, 11:27 AM, Nick Burke wrote:
>>>>
>>>>> Update:
>>>>>
>>>>> After running nperf on same instances on the same virtual network, it
>>>>> looks
>>>>> like all instances can get no more than 2Mb/s. Additionally, it's
>>>>> sporadic
>>>>> and ranges from <1Mb/s, but never more than 2Mb/s:
>>>>>
>>>>> user@localhost:~$ iperf -c 10.1.0.1 -d
>>>>> ------------------------------------------------------------
>>>>> Server listening on TCP port 5001
>>>>> TCP window size: 85.3 KByte (default)
>>>>> ------------------------------------------------------------
>>>>> ------------------------------------------------------------
>>>>> Client connecting to 10.1.0.1, TCP port 5001
>>>>> TCP window size: 86.8 KByte (default)
>>>>> ------------------------------------------------------------
>>>>> [  5] local 10.1.0.10 port 50432 connected with 10.1.0.1 port 5001
>>>>> [ ID] Interval       Transfer     Bandwidth
>>>>> [  5]  0.0-11.0 sec  1.25 MBytes   950 Kbits/sec
>>>>> [  4] local 10.1.0.10 port 5001 connected with 10.1.0.1 port 53839
>>>>> [  4]  0.0-11.1 sec  2.50 MBytes  1.89 Mbits/sec
>>>>> user@localhost:~$ iperf -c 10.1.0.1 -d
>>>>> ------------------------------------------------------------
>>>>> Server listening on TCP port 5001
>>>>> TCP window size: 85.3 KByte (default)
>>>>> ------------------------------------------------------------
>>>>> ------------------------------------------------------------
>>>>> Client connecting to 10.1.0.1, TCP port 5001
>>>>> TCP window size: 50.3 KByte (default)
>>>>> ------------------------------------------------------------
>>>>> [  5] local 10.1.0.10 port 52248 connected with 10.1.0.1 port 5001
>>>>> [ ID] Interval       Transfer     Bandwidth
>>>>> [  5]  0.0-12.6 sec  1.25 MBytes   834 Kbits/sec
>>>>> [  4] local 10.1.0.10 port 5001 connected with 10.1.0.1 port 53840
>>>>> [  4]  0.0-11.9 sec  2.13 MBytes  1.49 Mbits/sec
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 15, 2014 at 11:40 AM, Nick Burke <nick@nickburke.com>

>>>>> wrote:
>>>>>
>>>>>   I upgraded from 4.0 to 4.3.0 some time ago. I didn't restart 
>>>>> anything
>>>>>> and
>>>>>> it was all working great. However, I had to perform some 
>>>>>> maintenance and
>>>>>> had to restart everything. Now, I'm seeing packet loss on all 
>>>>>> virtuals,
>>>>>> even ones on the same host.
>>>>>>
>>>>>> sudo ping -c 500  -f 172.20.1.1
>>>>>> PING 172.20.1.1 (172.20.1.1) 56(84) bytes of data.
>>>>>> ........................................
>>>>>> --- 172.20.1.1 ping statistics ---
>>>>>> 500 packets transmitted, 460 received, 8% packet loss, time 864ms
>>>>>> rtt min/avg/max/mdev = 0.069/0.218/1.290/0.139 ms, ipg/ewma 
>>>>>> 1.731/0.328
>>>>>> ms
>>>>>>
>>>>>> No interface errors reported anywhere. The host itself isn't 
>>>>>> under load
>>>>>> at
>>>>>> all. Doesn't matter if the instance uses e1000 or virtio for the
>>>>>> drivers.
>>>>>> The only thing that I'm aware of that changed was that I had to 
>>>>>> reboot
>>>>>> all
>>>>>> the physical servers.
>>>>>>
>>>>>>
>>>>>> Could be related, but I was hit with the
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6464
>>>>>>
>>>>>> bug. I did follow with Marcus' suggestion:
>>>>>>
>>>>>>
>>>>>> *"This is a shot in the dark, but there have been some issues around
>>>>>>
>>>>>> upgrades that involve the cloud.vlan table expected contents 
>>>>>> changing.
>>>>>> New
>>>>>> 4.3 installs using vlan isolation don't seem to reproduce the issue.
>>>>>> I'll
>>>>>> see if I can reproduce anything like this with basic and/or non-vlan
>>>>>> isolated upgrades/installs. Can anyone experiencing an issue look
at
>>>>>> their
>>>>>> database via something like "select * from cloud.vlan" and look 
>>>>>> at the
>>>>>> vlan_id. If you see something like "untagged" instead of
>>>>>> "vlan://untagged",
>>>>>> please try changing it and see if that helps."*
>>>>>>
>>>>>> -- 
>>>>>> Nick
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *'What is a human being, then?' 'A seed' 'A... seed?' 'An acorn 
>>>>>> that is
>>>>>>
>>>>>> unafraid to destroy itself in growing into a tree.' -David 
>>>>>> Zindell, A
>>>>>> Requiem for Homo Sapiens*
>>>>>>
>>>>>>
>>>>>
>>>
>>> -- 
>>> Nick
>>>
>>>
>>>
>>>
>>>
>>> *'What is a human being, then?' 'A seed' 'A... seed?' 'An acorn that is
>>> unafraid to destroy itself in growing into a tree.' -David Zindell, A
>>> Requiem for Homo Sapiens*
>>>
>>
>>
>


Mime
View raw message