incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viktor Jevdokimov <Viktor.Jevdoki...@adform.com>
Subject RE: Replication factor
Date Thu, 24 May 2012 13:24:40 GMT
All data is in the page cache. No repairs. Compactions not hitting disk for read. CPU <50%.
ParNew GC <100 ms in average.

After one compaction completes, new sstable is not in page cache, there may be a disk usage
spike before data is cached, so local reads gets slower for a moment, comparing with other
nodes. Redirecting almost all requests to other nodes finally ends up with a huge latency
spike almost on all nodes, especially when ParNew GC may spike on one node (>200ms). We
call it "cluster hiccup", when incoming and outgoing network traffic drops for a moment.

And such hiccups happens several times an hour, few seconds long. Playing with badness threshold
did not gave a lot better results, but turning DS off completely fixed all problems with latencies,
node spikes, cluster hiccups and network traffic drops.

In our case, our client is selecting endpoints for a key by calculating a token, so we always
hit a replica.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
What is Adform: watch this short video<http://vimeo.com/adform/display>

[Adform News] <http://www.adform.com>


Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.

From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Thursday, May 24, 2012 13:00
To: user@cassandra.apache.org
Subject: Re: Replication factor

Your experience is when using CL ONE the Dynamic Snitch is moving local reads off to other
nodes and this is causing spikes in read latency ?

Did you notice what was happening on the node for the DS to think it was so slow ? Was compaction
or repair going on ?

Have you played with the badness threshold https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L472
?

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/05/2012, at 5:28 PM, Viktor Jevdokimov wrote:


Depends on use case. For ours we have another experience and statistics, when turning dynamic
snitch off makes overall latency and spikes much, much lower.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
What is Adform: watch this short video<http://vimeo.com/adform/display>

<signature-logo29.png><http://www.adform.com>


Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.

From: Brandon Williams [mailto:driftx@gmail.com]<mailto:[mailto:driftx@gmail.com]>
Sent: Thursday, May 24, 2012 02:35
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Replication factor

On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov <Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>>
wrote:
> When RF == number of nodes, and you read at CL ONE you will always be reading locally.
"always be reading locally" - only if Dynamic Snitch is "off". With dynamic snitch "on" request
may be redirected to other node, which may introduce latency spikes.

Actually it's preventing spikes, since if it won't read locally that means the local replica
is in worse shape than the rest (compacting, repairing, etc.)

-Brandon


Mime
View raw message