cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaolong Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
Date Thu, 11 May 2017 16:19:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006704#comment-16006704
] 

Xiaolong Jiang commented on CASSANDRA-10726:
--------------------------------------------

regarding distinctHostNum:
  /**
         * When doing the read repair, the mutation is per partition key, so it's possible
we will repair multiple
         * partitions into different hosts. let's say RF = 5, we need to read partition p1,
p2, p3, p4 from three nodes,
         * n1, n2, n3. If n1 contains latest data, n2 is missing p1 and p2, n2 is missing
p3 and p4. So we need to run
         * repair for n2 by sending p1 and p2 partitions and run repair for n3 by sending
p3 and p4 partitions. it's
         * possible p1 and p3 repair is slow, so beloew distinctHostNum will return 2. In
this case, I will not retry
         * a new node for read repair since this read repair retry will only handle one slow
host. If p3 and p4 is fast,
         * p1 and p2 repair is slow or just p1 repair is slow, below distinctHostNum will
return 1, in this case, I will
         * retry 1 extra node and send p1, p2 to extra node or just p1 if only p1 read repair
times out.
         * In same host, we can have multiple partition read repair and we can only handle
one host slowness, so we should
         * get distinct host from read repair response future.
         */

> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: Xiaolong Jiang
>             Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert to update
out of date replicas is blocking. This means, if it fails, the read fails with a timeout.
If a node is dropping writes (maybe it is overloaded or the mutation stage is backed up for
some other reason), all reads to a replica set could fail. Further, replicas dropping writes
get more out of sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any replica that's
> // behind on writes in case the out-of-sync row is read multiple times in quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not be blocking
or we should return success for the read even if the write times out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message