cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning
Date Fri, 10 Jan 2014 20:09:50 GMT


Tyler Hobbs commented on CASSANDRA-6465:

[~ianbarfield] thanks for the analysis, you make some excellent observations.

>From the discussion in CASSANDRA-3722, it seems like the two motivations for the time
penalty were these:
# When a node dies, the FD will not mark it down for a while; in the meantime, we'd like to
stop sending queries to it
# In a multi-DC setup, we would like to penalize the remote DC, but not so much that we won't
ever use it when local nodes become very slow

I suspect that rapid read protection (CASSANDRA-4705) does a good job of mitigating the #1
case until the FD marks the node down.  I'll do some testing to confirm this.

I don't feel like the #2 case needs special treatment from the dynamic snitch, especially
with the badness_threshold in effect.  Latency to the remote DC should prevent it from being
used under normal circumstances.  If users really want to guarantee that, the LOCAL consistency
levels are always available.

> DES scores fluctuate too much for cache pinning
> -----------------------------------------------
>                 Key: CASSANDRA-6465
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 1.2.11, 2 DC cluster
>            Reporter: Chris Burroughs
>            Assignee: Tyler Hobbs
>            Priority: Minor
>              Labels: gossip
>             Fix For: 2.0.5
>         Attachments: des-score-graph.png, des.sample.15min.csv,
> To quote the conf:
> {noformat}
> # if set greater than zero and read_repair_chance is < 1.0, this will allow
> # 'pinning' of replicas to hosts in order to increase cache capacity.
> # The badness threshold will control how much worse the pinned host has to be
> # before the dynamic snitch will prefer other replicas over it.  This is
> # expressed as a double which represents a percentage.  Thus, a value of
> # 0.2 means Cassandra would continue to prefer the static snitch values
> # until the pinned host was 20% worse than the fastest.
> dynamic_snitch_badness_threshold: 0.1
> {noformat}
> An assumption of this feature is that scores will vary by less than dynamic_snitch_badness_threshold
during normal operations.  Attached is the result of polling a node for the scores of 6 different
endpoints at 1 Hz for 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints`
for row that is known to get reads.  The node was acting as a coordinator for a few hundred
req/second, so it should have sufficient data to work with.  Other traces on a second cluster
have produced similar results.
>  * The scores vary by far more than I would expect, as show by the difficulty of seeing
anything useful in that graph.
>  * The difference between the best and next-best score is usually > 10% (default dynamic_snitch_badness_threshold).
> Neither ClientRequest nor ColumFamily metrics showed wild changes during the data gathering
> Attachments:
>  * jython script cobbled together to gather the data (based on work on the mailing list
from Maki Watanabe a while back)
>  * csv of DES scores for 6 endpoints, polled about once a second
>  * Attempt at making a graph

This message was sent by Atlassian JIRA

View raw message