cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-3395) Quorum returns incorrect results during hinted handoff
Date Tue, 01 Nov 2011 17:29:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141357#comment-13141357
] 

Jonathan Ellis edited comment on CASSANDRA-3395 at 11/1/11 5:27 PM:
--------------------------------------------------------------------

This is tricky, but we figured out what's happening.

First, hinted handoff isn't important to reproducing, but bouncing nodes is.  You need a node
to miss an update to get this.

If you do that, then you can get this situation, as seen in Brandon's log:

all 3 nodes reply: (not strictly necessary, all we need is two nodes that disagree with each
other)
{noformat}
DEBUG [ReadRepairStage:8] 2011-10-21 19:35:23,105 RowDigestResolver.java (line 62) resolving
3 responses
{noformat}

The responses don't match:
{noformat}
DEBUG [pool-2-thread-6] 2011-10-21 19:35:23,105 StorageProxy.java (line 615) Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(91747740688180627279175449712403223124,
747465737472617732) (6705a2ef7042fd98f2c30c5450d33e17 vs bf8c16eb98f3209d3abb723ee8c33185)
{noformat}

The coordinator requests the actual data from each replica and merges the responses:
{noformat}
DEBUG [pool-2-thread-6] 2011-10-21 19:35:23,107 SliceQueryFilter.java (line 123) collecting
0 of 2147483647: 00004ecb:false:4@1319226304173446
DEBUG [pool-2-thread-6] 2011-10-21 19:35:23,108 SliceQueryFilter.java (line 123) collecting
1 of 2147483647: 00004ecc:false:4@1319226304178304
{noformat}

Note that 00004ecb=20171, and 00004ecc=20172.  So both columns are present, and the coordinator
now has a slice of [20171, 20172].  It repairs the missing data, then reverses the order as
requested in the query and returns [20172, 20171] to the client.

So, the bug on the Cassandra side is that we don't re-restrict the resultset to the requested
count after a digest mismatch, before sending it to the client.

Then, the client calls popitem() on the result, which means you get back the *last* item in
the resultset, i.e., 20171.

In short: Cassandra needs to fix sending back more results than requested when there are different
versions on different nodes that need to be resolved.  We'll address this as part of the related
CASSANDRA-3303.  Unfortunately, this code path is one that we know from experience is easy
to introduce regressions to, so I don't think we can safely do this in 0.8; the fix will be
in 1.0.2+.

However, a simple workaround exists, which is for clients to consume results from the front
of the row, instead of the back.  This is what the telphus script did by accident, which is
why Brandon couldn't reproduce with that version.
                
      was (Author: jbellis):
    This is tricky, but I think I know what's happening.

First, hinted handoff isn't important to reproducing, but bouncing nodes is.  You need a node
to miss an update to get this.

If you do that, then you can get this situation, as seen in Brandon's log:

all 3 nodes reply: (not strictly necessary, all we need is two nodes that disagree with each
other)
{noformat}
DEBUG [ReadRepairStage:8] 2011-10-21 19:35:23,105 RowDigestResolver.java (line 62) resolving
3 responses
{noformat}

The responses don't match:
{noformat}
DEBUG [pool-2-thread-6] 2011-10-21 19:35:23,105 StorageProxy.java (line 615) Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(91747740688180627279175449712403223124,
747465737472617732) (6705a2ef7042fd98f2c30c5450d33e17 vs bf8c16eb98f3209d3abb723ee8c33185)
{noformat}

The coordinator requests the actual data from each replica and merges the responses:
{noformat}
DEBUG [pool-2-thread-6] 2011-10-21 19:35:23,107 SliceQueryFilter.java (line 123) collecting
0 of 2147483647: 00004ecb:false:4@1319226304173446
DEBUG [pool-2-thread-6] 2011-10-21 19:35:23,108 SliceQueryFilter.java (line 123) collecting
1 of 2147483647: 00004ecc:false:4@1319226304178304
{noformat}

Note that 00004ecb=20171, and 00004ecc=20172.  So both columns are present, and the coordinator
now has a slice of [20171, 20172].  It repairs the missing data, then reverses the order as
requested in the query and returns [20172, 20171] to the client.

So, the bug on the Cassandra side is that we don't re-restrict the resultset to the requested
count after a digest mismatch, before sending it to the client.

Then, the client calls popitem() on the result, which means you get back the *last* item in
the resultset, i.e., 20171.

In short: Cassandra needs to fix sending back more results than requested when there are different
versions on different nodes that need to be resolved.  We'll address this as part of the related
CASSANDRA-3303.  Unfortunately, this code path is one that we know from experience is easy
to introduce regressions to, so I don't think we can safely do this in 0.8; the fix will be
in 1.0.2+.

However, a simple workaround exists, which is for clients to consume results from the front
of the row, instead of the back.  This is what the telphus script did by accident, which is
why Brandon couldn't reproduce with that version.
                  
> Quorum returns incorrect results during hinted handoff
> ------------------------------------------------------
>
>                 Key: CASSANDRA-3395
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3395
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 0.8.8
>
>         Attachments: logs.tar.bz2, ttest.py, ttestraw.py
>
>
> In a 3 node cluster with RF=3 and using a single coordinator, if monotonically increasing
columns are inserted into a row and the latest one sliced (both at QUORUM) during HH replay
occasionally this column will not be seen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message