cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6194) speculative retry can sometimes violate consistency
Date Tue, 22 Oct 2013 20:08:47 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802211#comment-13802211
] 

Jonathan Ellis commented on CASSANDRA-6194:
-------------------------------------------

The problem is that when we speculate, we do multiple data reads, and RowDigestResolver assumes
there is only one data read.  (If there is more than one, it does not error out but silently
drops all but one.)

So if the speculative read results in triggering the callback's "we have enough replies to
satisfy CL" logic, and the speculative data read finished before the digest, we effectively
do CL.ONE logic instead of CL.QUORUM.

https://github.com/jbellis/cassandra/commits/6194 includes a fix for this and also a fix for
DigestMismatch logic with SR.

> speculative retry can sometimes violate consistency
> ---------------------------------------------------
>
>                 Key: CASSANDRA-6194
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6194
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 2.0.2
>
>         Attachments: 6194.txt
>
>
> This is most evident with intermittent failures of the short_read dtests.  I'll focus
on short_read_reversed_test for explanation, since that's what I used to bisect.  This test
inserts some columns into a row, then deletes a subset, but it performs each delete on a different
node, with another node down (hints are disabled.)  Finally it reads the row back at QUORUM
and checks that it doesn't see any deleted columns, however with speculative retry on this
often fails.  I bisected this to the change that made 99th percentile SR the default reliably
by looping the test enough times at each iteration to be sure it was passing or failing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message