cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyrylo Lebediev <Kyrylo_Lebed...@epam.com.INVALID>
Subject Re: dynamic_snitch=false, prioritisation/order or reads from replicas
Date Wed, 08 Aug 2018 13:35:13 GMT
Thank you for explaining, Alain!


Predetermining the nodes to query, then sending 'data' request to one of them and 'digest'
request to another (for CL=QUORUM, RF=3) indeed explains more effective use of filesystem
cache when dynamic snitching is disabled.


So, there will be replica / replicas for each token range that will never be queried (2 replicas
for CL=ONE, 1 replica for CL=QUORUM for RF=3). But taking into account that data is evenly
distributed across all nodes in the cluster, looks like there shouldn't be any issues related
to such load redistribution, except the case that you mentioned, when a node is having performance
issues but all requests are being sent to in anyway.


Regards,

Kyrill


________________________________
From: Alain RODRIGUEZ <arodrime@gmail.com>
Sent: Wednesday, August 8, 2018 1:27:50 AM
To: user cassandra.apache.org
Subject: Re: dynamic_snitch=false, prioritisation/order or reads from replicas

Hello Kyrill,

But in case of CL=QUORUM/LOCAL_QUORUM, if I'm not wrong, read request is sent to all replicas
waiting for first 2 to reply.

My understanding is that this sentence is wrong. It is as you described it for writes indeed,
all the replicas got the information (and to all the data centers). It's not the case for
reads. For reads, x nodes are picked and used (x = ONE, QUORUM, ALL, ...).

Looks like the only change for dynamic_snitch=false is that "data" request is sent to a determined
node instead of "currently the fastest one".

Indeed, the problem is that the 'currently the fastest one' changes very often in certain
cases, thus removing the efficiency from the cache without enough compensation in many cases.
The idea of not using the 'bad' nodes is interesting to have more predictable latencies when
a node is slow for some reason. Yet one of the side effects of this (and of the scoring that
does not seem to be absolutely reliable) is that the clients are often routed to distinct
nodes when under pressure, due to GC pauses for example or any other pressure.
Saving disk reads in read-heavy workloads under pressure is more important than trying to
save a few milliseconds picking the 'best' node I guess.
I can imagine that alleviating these disks, reducing the number of disk IO/throughput ends
up lowering the latency for all the nodes, thus the client application latency improves overall.
That is my understanding of why it is so often good to disable the dynamic_snitch.

Did you get improved response for CL=ONE only or for higher CL's as well?

I must admit I don't remember for sure, but many people are using 'LOCAL_QUORUM' and I think
I saw this for this consistency level as well. Plus this question might no longer stand as
reads in Cassandra work slightly differently than what you thought.

I am not 100% comfortable with this 'dynamic_snitch theory' topic, so I hope someone else
can correct me if I am wrong, confirm or add information :). But for sure I have seen this
disabled giving some really nice improvement (as many others here as you mentioned). Sometimes
it was not helpful, but I have never seen this change being really harmful though.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com<mailto:alain@thelastpickle.com>
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-08-06 22:27 GMT+01:00 Kyrylo Lebediev <Kyrylo_Lebediev@epam.com.invalid<mailto:Kyrylo_Lebediev@epam.com.invalid>>:

Thank you for replying, Alain!


Better use of cache for 'pinned' requests explains good the case when CL=ONE.


But in case of CL=QUORUM/LOCAL_QUORUM, if I'm not wrong, read request is sent to all replicas
waiting for first 2 to reply.

When dynamic snitching is turned on, "data" request is sent to "the fastest replica", and
"digest" requests - to the rest of replicas.

But anyway digest is the same read operation [from SSTables through filesystem cache] + calculating
and sending hash to coordinator. Looks like the only change for dynamic_snitch=false is that
"data" request is sent to a determined node instead of "currently the fastest one".

So, if there are no mistakes in above description, improvement shouldn't be much visible for
CL=*QUORUM...


Did you get improved response for CL=ONE only or for higher CL's as well?


Indeed an interesting thread in Jira.


Thanks,

Kyrill

________________________________
From: Alain RODRIGUEZ <arodrime@gmail.com<mailto:arodrime@gmail.com>>
Sent: Monday, August 6, 2018 8:26:43 PM
To: user cassandra.apache.org<http://cassandra.apache.org>
Subject: Re: dynamic_snitch=false, prioritisation/order or reads from replicas

Hello,

There are reports (in this ML too) that disabling dynamic snitching decreases response time.

I confirm that I have seen this improvement on clusters under pressure.

What effects stand behind this improvement?

My understanding is that this is due to the fact that the clients are then 'pinned', more
sticking to specific nodes when the dynamic snitching is off. I guess there is a better use
of caches and in-memory structures, reducing the amount of disk read needed, which can lead
to way more performances than switching from node to node as soon as the score of some node
is not good enough.
I am also not sure that the score calculation is always relevant, thus increasing the threshold
before switching reads to another node is still often worst than disabling it completely.
I am not sure if the score calculation was fixed, but in most cases, I think it's safer to
run with 'dynamic_snitch: false'. Anyway, it's possible to test it on a canary node (or entire
rack) and look at the p99 for read latencies for example :).

This ticket is old, but was precisely on that topic: https://issues.apache.org/jira/browse/CASSANDRA-6908

C*heers
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com<mailto:alain@thelastpickle.com>
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-08-04 15:37 GMT+02:00 Kyrylo Lebediev <Kyrylo_Lebediev@epam.com.invalid<mailto:Kyrylo_Lebediev@epam.com.invalid>>:

Hello!


In case when dynamic snitching is enabled data is read from 'the fastest replica' and other
replicas send digests for CL=QUORUM/LOCAL_QUORUM .

When dynamic snitching is disabled, as the concept of the fastest replica disappears, which
rules are used to choose from which replica to read actual data (not digests):

 1) when all replicas are online

 2) when the node primarily responsible for the token range is offline.


There are reports (in this ML too) that disabling dynamic snitching decreases response time.

What effects stand behind this improvement?


Regards,

Kyrill



Mime
View raw message