cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices
Date Mon, 02 Jan 2012 14:04:32 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-1600:
----------------------------------------

    Attachment: 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch
                0002-thrift-generated-code-changes-v4.patch
                0001-Add-optional-index-clause-to-KeyRange-v4.patch

Attaching rebased and updated v4 (I've squashed the CQL changes into the 3rd patch).

bq. What do we gain from "typedefing" List<IndexExpression> to FilterClause? (I note
this was part of Stu and my original attempts back in April but I don't remember a good reason
for that.)

I do not really see a good reason either (I rebased that part without really thinking about
it). I've removed FilterClause in v4.

{quote}
{noformat}
+        /*
+         * XXX: If the range requested is a token range, we'll have to start at the beginning
(and stop at the end) of
+         * the indexed row unfortunately (which will be inefficient), because we have not
way to intuit the small
+         * possible key having a given token. A fix would be to actually store the token
along the key in the
+         * indexed row.
+         */
{noformat}
This is fine since there's no reason to be searching by token unless you're doing an exhaustive
scan, i.e. a m/r job.
{quote}

Actually, I believe this does have some impact. When a requested range of keys spans multiple
nodes, the range is splitted using nodes Token (by SP.getRestrictedRanges()). And while it
is true that in simple cases this amount to checking all the keys in the node and thus is
not inefficient, there is at least 2 cases where this isn't the case:
* If the ranges a node is responsible of are not contiguous. Say a node is responsible for
range (5, 10] and (20, 30]. And say a user requests a range of keys such that the start key
token is 15 (and the end key token is say 25). Then the node will end up being requested for
keys in (20, 25] (where those are tokens, not keys). In that case, it will have to uselessly
scan (and skip) all keys in (0, 10] the node has.
* When token range wraps. This is actually the same problem than above in that when a node
has a range like (2^127-1, 2^126], it really hold both ranges (0, 2^126] and (2^127, 2^127-1]
which are non contiguous as far as the ordering on disk is concerned.
While the 'the strategy creates non-contiguous ranges' problem is not a huge deal since neither
SimpleStrategy nor NTS creates them (if I'm correct), the instance of the problem due to wrapping
ranges is a very real inefficiency of the implementation.

As a side note, storing the token as suggested in the comments above would also likely have
the benefit of helping a good deal the problem of the time spent doing digest computations
noted in CASSANDRA-3545.

{quote}
{noformat}
+                        rows.addAll(RangeSliceVerbHandler.executeLocally(command));
{noformat}
Another place the original patches failed... we should avoid this because it means we're now
allowing one range scan per thrift client instead of one per read stage thread, and it bypasses
the "drop hopeless requests" overcapacity protection built in there. Look at SP.LocalReadRunnable
for how to do this safely. Simplest fix would be to just continue routing all range scans
over MessagingService.
{quote}

While I agree with you, I'll note that this is not a problem introduced by this patch: the
current code does a call to CFS.getRangeSlice() directly in that case (bypassing the read
stage and the protections coming with it). We clearly should fix that but it's neither a detail
nor related to this patch so I suggest spawning another ticket for that instead.
                
> Merge get_indexed_slices with get_range_slices
> ----------------------------------------------
>
>                 Key: CASSANDRA-1600
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Stu Hood
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1
>
>         Attachments: 0001-Add-optional-FilterClause-to-KeyRange-and-support-do-v2.patch,
0001-Add-optional-FilterClause-to-KeyRange-and-support-doin.txt, 0001-Add-optional-FilterClause-to-KeyRange-v3.patch,
0001-Add-optional-index-clause-to-KeyRange-v4.patch, 0002-allow-get_range_slices-to-apply-filter-to-a-sequenti-v2.patch,
0002-allow-get_range_slices-to-apply-filter-to-a-sequential.txt, 0002-thrift-generated-code-changes-v3.patch,
0002-thrift-generated-code-changes-v4.patch, 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v3.patch,
0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch, 0004-Update-cql-to-not-use-deprecated-index-scan-v3.patch
>
>
> From a comment on 1157:
> {quote}
> IndexClause only has a start key for get_indexed_slices, but it would seem that the reasoning
behind using 'KeyRange' for get_range_slices applies there as well, since if you know the
range you care about in the primary index, you don't want to continue scanning until you exhaust
'count' (or the cluster).
> Since it would appear that get_indexed_slices would benefit from a KeyRange, why not
smash get_(range|indexed)_slices together, and make IndexClause an optional field on KeyRange?
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message