cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5113) RepairCallback breaks CL guarantees
Date Fri, 04 Jan 2013 17:28:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544044#comment-13544044
] 

Jonathan Ellis commented on CASSANDRA-5113:
-------------------------------------------

+1 on the fix.

-0 on the rename; the only time we do full-data reads is for repairs so I don't really see
an improvement there.

-1 on committing to 1.1; this behavior has been present since at least 1.0 and probably longer,
and changes to core StorageProxy code like this have a long track record of causing subtle
regressions.
                
> RepairCallback breaks CL guarantees
> -----------------------------------
>
>                 Key: CASSANDRA-5113
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5113
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1.9
>
>         Attachments: 0001-Always-ensure-CL-after-a-digest-mismatch.txt, 0002-Rename-RowRepairResolver-to-RowDataResolver.txt
>
>
> RepairCallback does not validate the consistency level of the query. It seems that this
was done on purpose as the comments there says:
> {noformat}
> /**
>  * The main difference between this and ReadCallback is, ReadCallback has a ConsistencyLevel
>  * it needs to achieve.  Repair on the other hand is happy to repair whoever replies
within the timeout.
>  */
> {noformat}
> Concretely, the get() method of RepairCallback:
> * waits for all endpoints, even if there is more than strictly required by the CL.
> * if it timeouts, doesn't check it and always return a response.
> * for some reason, it returns null unless there is strictly more than 1 response.
> All of that seems wrong to me. The result of RepairCallback is what is returned to the
client in case of a digest mismatch on the first read. So we must ensure that the CL has been
reached. Also, returning null where there is 1 response (or none) seems clearly wrong.
> In fact I don't think we need a special callback for this "read all data" phase as it
is a "normal" read (the fact we do a first read with digests is just an "optimization"). The
only difference between the two phases should be how we resolve the responses (in the first
case we have digest and in the 2nd we don't) but that's handled by the resolver.
> So attaching a patch that removes RepairCallback and use ReadCallback instead.  I'm also
attaching a 2nd trivial patch that renames RowRepairResolver to RowDataResolver because I
think it describe better what this actually do (i.e.  the main goal is to resolve a full data
read to answer the right value to the client, repairing inconsistent nodes is secondary).
> The patch is against 1.1, because I think breaking CL guarantees is probably serious
enough to warrant pushing this to 1.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message