cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Network / GC / Latency spike
Date Tue, 01 Sep 2015 10:05:04 GMT
Hi Fabien, thanks for your help.

I did not mention it but I indeed saw a correlation between latency and
read repairs spikes. Though this is like going from 5 RR per second to 10
per sec cluster wide according to opscenter: http://img42.com/L6gx1

I have indeed some wide rows and this explanation looks reasonable to me, I
mean this makes sense. Yet isn't this amount of Read Repair too low to
induce such a "shitstorm" (even if it spikes x2, I got network x10) ? Also
wide rows are present on heavy used tables (sadly...), so I should be using
more network all the time (why only a few spikes per day (like 2 / 3 max) ?

How could I confirm this, without removing RR and waiting a week I mean, is
there a way to see the size of the data being repaired through this
mechanism ?

C*heers

Alain

2015-09-01 0:11 GMT+02:00 Fabien Rousseau <fabifabi95@gmail.com>:

> Hi Alain,
>
> Could it be wide rows + read repair ? (Let's suppose the "read repair"
> repairs the full row, and it may not be subject to stream throughput limit)
>
> Best Regards
> Fabien
>
> 2015-08-31 15:56 GMT+02:00 Alain RODRIGUEZ <arodrime@gmail.com>:
>
>> I just realised that I have no idea about how this mailing list handle
>> attached files.
>>
>> Please find screenshots there --> http://img42.com/collection/y2KxS
>>
>> Alain
>>
>> 2015-08-31 15:48 GMT+02:00 Alain RODRIGUEZ <arodrime@gmail.com>:
>>
>>> Hi,
>>>
>>> Running a 2.0.16 C* on AWS (private VPC, 2 DC).
>>>
>>> I am facing an issue on our EU DC where I have a network burst
>>> (alongside with GC and latency increase).
>>>
>>> My first thought was a sudden application burst, though, I see no
>>> corresponding evolution on reads / write or even CPU.
>>>
>>> So I thought that this might come from the node themselves as IN almost
>>> equal OUT Network. I tried lowering stream throughput on the whole DC to 1
>>> Mbps, with ~30 nodes --> 30 Mbps --> ~4 MB/s max. My network went a lot
>>> higher about 30 M in both sides (see screenshots attached).
>>>
>>> I have tried to use iftop to see where this network is headed too, but I
>>> was not able to do it because burst are very shorts.
>>>
>>> So, questions are:
>>>
>>> - Did someone experienced something similar already ? If so, any clue
>>> would be appreciated :).
>>> - How can I know (monitor, capture) where this big amount of network is
>>> headed to or due to ?
>>> - Am I right trying to figure out what this network is or should I
>>> follow an other lead ?
>>>
>>> Notes: I also noticed that CPU does not spike nor does R&W, but disk
>>> reads also spikes !
>>>
>>> C*heers,
>>>
>>> Alain
>>>
>>
>>
>

Mime
View raw message