cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZhaoYang (Jira)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-15774) Improve range reads to query by endpoints instead of vnodes to reduce number of remote requests
Date Wed, 29 Apr 2020 16:29:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-15774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ZhaoYang updated CASSANDRA-15774:
---------------------------------
    Fix Version/s: 4.x

> Improve range reads to query by endpoints instead of vnodes to reduce number of remote
requests
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15774
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15774
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Coordination
>            Reporter: ZhaoYang
>            Priority: Normal
>             Fix For: 4.x
>
>
> Currently, range read queries in batches, see {{StorageProxy.RangeCommandIterator#sendNextRequests()}}.
For each batch, it computes a list of merged vnode ranges up to concurrency factor and query
each merged vnode range asynchronously. (note: consecutive vnode ranges can be merged if they
share enough replicas to satisfy consistency level requirement)
> This works fine in general, but when concurrency factor is high because returned row
count is small comparing to query limit or index filtering is used, coordinator may send too
many concurrent remote range requests in a batch.
> We can improve it by grouping remote range requests by endpoints where each endpoint
will return response corresponding to multiple non-consecutive ranges. With endpoint grouping,
number of remote range requests should largely reduced and it's always capped by number of
nodes in the cluster instead of number of ranges which is capped by concurrency factor.
> Let's look at an example on a 5-node cluster with 10 ranges(a,b,c,d,e,f,g,h,i,h) and
rf3.
> Following is the range to replica mapping using round robin that should work well with
consecutive range merger (consecutive range merger doesn't work well with fully random replica
mapping, because it's less likely to have overlapping replicas for consecutive ranges)
> {code:java}
>    range-a replicas: 1, 2, 3
>    range-b replicas: 2, 3, 4
>    range-c replicas: 3, 4, 5
>    range-d replicas: 1, 4, 5
>    range-e replicas: 1, 2, 5
>    range-f replicas: 1, 2, 3
>    range-g replicas: 2, 3, 4
>    range-h replicas: 3, 4, 5
>    range-i replicas: 1, 4, 5
>    range-j replicas: 1, 2, 5
> {code}
> With default range read implementation and consecutive range merger, we need 10 replica
read requests(2 for each merged range) for quorum:
> {code:java}
>      range (a,b] on node [2, 3]
>      range (c,d] on node [4, 5]
>      range (e,f] on node [1, 2]
>      range (g,h] on node [3, 4]
>      range (i,j] on node [1, 5]
> {code}
> With group query by endpoints, we only need 4 replica read requests for quorum:
> {code:java}
>     * node 1: a, d, e, f, i, j
>     * node 2: a, b, e, f, g, j
>     * node 3: b, c, g, h
>     * node 4: c, d, h, i
> {code}
>  
> Note that there are some complexities around short-read protection which needs to know
whether replica has more rows available for current range.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message