cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
Date Tue, 14 Mar 2017 23:41:42 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925260#comment-15925260
] 

Jason Brown commented on CASSANDRA-10726:
-----------------------------------------

[~xiaolong302@gmail.com] Thanks for the comments they add more context to the patch. Here's
my initial round of comments on your patch:

- general nit: code style is incorrect wrt braces
- rename {{ConsistencyLevel#isMajority}} to {{ConsistencyLevel#isQuorum}} as it's more in
line with our general naming conventions
- {{AbstractReadExecutor#getReadRepairExtra}} 
-- function needs a javadoc comment
-- rename {{#readRepairExtra}} to ... ??? "spareReadRepairNode". I would like to make this
Optional<InetAddress> as it makes it clearer that it's not required (rather than piggybacking
on "null" to indicate that sematic meaning).
-- {{readRepairExtra = allSortedReplicas.get(blockForReplicas.size());}} - this seems unsafe
as it assumes the {{List}}s are ordered the same.

- {{DataResolver}}
-- {{#reppairExtra}} is misspelled. Further, make it consistent with whatever we name it in
{{AbstractReadExecutor}}
-- {{#close}} - don't add the exception to the {{Tracing.trace()}} call, and it's debatable
if you need it on the {{logger.debug}} line, as well.
-- {{repairResponseRequestMap}} and {{#responseCntSnapshot}} - I don't think you actually
need these. You just care about the number of replicas that have responded, and you can just
call {{responses.size()}} where you actually care about (in {{#waitRepairToFinishWithPossibleRetry()}}.
-- {{#distinctHostNum()}} I'm not sure why you need this as we should only send the message
to distinct hosts. I think you can just use {{results.size()}} instead.
-- {{#waitRepairToFinishWithPossibleRetry}}
--- wrt the block starting at line 224. You iterate over all the timed out entries, but you
need to cover the full merge set and send that "repairExtra" node. You don't know what data
that replica has or needs, thus you need to send the full merged set. I'm not sure off the
top of my head where/how to grab the merged rows, but I'm sure you can figure it out ;)
--- Furthermore, once you get the merged set, you do not need to send multiple messages to
the target node, and can instead send one. Thus you can simplify the block starting at line
224 as such:
{code}
                Tracing.trace("retry read-repair-mutation to {}", reppairExtra);
                PartitionUpdate update = // get merged result from some location ...
                MessageOut messageOut = new Mutation(update).createMessage(MessagingService.Verb.READ_REPAIR);

                AsyncOneResponse response = MessagingService.instance().sendRR(messageOut,reppairExtra);
                response.get(waitTimeNanos, TimeUnit.NANOSECONDS); // TimeoutException will
be thrown 
		....
{code}

The biggest thing this patch needs is testing. You might be able to unit test this one (in
fact [~xiaolong302@gmail.com] spoke offline about some idea about how to do it), but it will
take some time (worthwhile, in my opinion). A dtest will probably be required, as well, even
though that will get tricky - byteman will probably necessary to help you out.

> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: Xiaolong Jiang
>             Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert to update
out of date replicas is blocking. This means, if it fails, the read fails with a timeout.
If a node is dropping writes (maybe it is overloaded or the mutation stage is backed up for
some other reason), all reads to a replica set could fail. Further, replicas dropping writes
get more out of sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any replica that's
> // behind on writes in case the out-of-sync row is read multiple times in quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not be blocking
or we should return success for the read even if the write times out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message