incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: High disk read throughput on only one node.
Date Fri, 21 Dec 2012 10:09:42 GMT
It looks like nobody has already experiment this kind of trouble or even
has a clue about it.

Under heavy load this creates a high latency (because of iowait) in my app
in prod and we can't handle it longer. If there is nothing new in the few
upcoming days I think I'll drop this node and replace it, hopping this will
fix my issue...

I wait a bit more because I am hopping we will find out what is the issue
and this will help the C* community.




2012/12/20 Alain RODRIGUEZ <arodrime@gmail.com>

> "routing more traffic to it?"
>
> So shouldn't I see more "network in" on that node in the AWS console ?
>
> It seems that each node is recieving and sending an equal amount of data.
>
> What value should I use for dynamic-snitch-badness-threshold to give it a
> try ?
> Le 20 déc. 2012 00:37, "Bryan Talbot" <btalbot@aeriagames.com> a écrit :
>
> Oh, you're on ec2.  Maybe the dynamic snitch is detecting that one node is
>> performing better than the others so is routing more traffic to it?
>>
>>
>> http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold
>>
>> -Bryan
>>
>>
>>
>>
>> On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <arodrime@gmail.com>wrote:
>>
>>> @Aaron
>>> "Is there a sustained difference or did it settle back ? "
>>>
>>> Sustained, clearly. During the day all nodes read at about 6MB/s while
>>> this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
>>> nodes reads at 8-10MB/s
>>>
>>> "Could this have been compaction or repair or upgrade tables working ? "
>>>
>>> Was my first thought but definitely no. this occurs continuously.
>>>
>>> "Do the read / write counts available in nodetool cfstats show anything
>>> different ? "
>>>
>>> The cfstats shows different counts (a lot less reads/writes for the
>>> "bad" node)  but they didn't join the ring at the same time. I join you the
>>> cfstats just in case it could help somehow.
>>>
>>> Node  38: http://pastebin.com/ViS1MR8d (bad one)
>>> Node  32: http://pastebin.com/MrSTHH9F
>>> Node 154: http://pastebin.com/7p0Usvwd
>>>
>>> @Bryan
>>>
>>>  "clients always connect to that server"
>>>
>>> I didn't join it in the screenshot from AWS console, but AWS report an
>>> (almost) equal network within the nodes (same for output and cpu). The cpu
>>> load is a lot higher in the broken node as shown by the OpsCenter, but
>>> that's due to the high iowait...)
>>>
>>
>>
>>
>> --
>> Bryan Talbot
>> Architect / Platform team lead, Aeria Games and Entertainment
>> Silicon Valley | Berlin | Tokyo | Sao Paulo
>>
>>
>>

Mime
View raw message