cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1337) parallelize fetching rows for low-cardinality indexes
Date Thu, 26 Jul 2012 09:44:35 GMT


Sylvain Lebresne commented on CASSANDRA-1337:

The patch almost move the check for "do we have enough rows for the query" inside the "don't
use the local path for that range", which is broken (as in "uselessly inefficient"), because
it means if we do have enough rows locally, we will still check another range.

But probably more importantly, I'm confused on how this patch works.

First, it estimated how much the query of *a range* will likely yield based on the *total*
number of keys on the node. We should at least divide this by the replication factor to have
a proper estimate of the number-of-keys-per-range.

Second, it seems to correctly parallelize "old-style" range_slice (i.e. range_slice for thrift
without any IndexExpression), but I think it doesn't correctly handle neither secondary indexes
(which was the main goal of the patch I believe), not any kind of range slices when CQL3 is
* For secondary index queries, the number or rows returned doesn't depend at all on the number
of row keys the node holds (and thus if follows that estimating the number of parallel queries
to do based on that parameter is broken), it depends on how many columns the row for the most
selective index contains. So for the concurrencyFactor we should 1) figure out which index
will be used and 2) probably use the estimated mean columns count for that index.
* For CQL3 queries, they use the maxIsColumns parameters (for both traditional *and* 2ndary
index queries) and so the number of rows returned shouldn't be directly compared to maxResults
(i.e. if the first row we find has enough columns to satisfy maxResults, we're done). In that
case, it unfortunately become more complicated to predict how much "results" a query might
yield in general, because this depends on the column filter. I.e. if the filter is a name
filter, or an "identity" slice filter (as in IdentitySliceFilter), we can try an estimate
(in the latter case, something like maxResults / (estimatedKeys * meanColumnsCountPerKey)),
but for other kind of slice filters, I don't think we can do much estimate. That being said,
it might still be worth using the estimate in those two cases, because at least for 2ndary
index query, the column filter will likely be very often an "identity" slice. And in that
"identity" slice case, for 2ndary index and CQL3, the estimation should be something like
maxResults / (meanColumnCountPerKey(index used) * meanColumnCountPerKey(parent cf)).

I'll note however that in both case, the meanColumnsCount is not necessary the perfect estimate
to use, as it pretty much imply that half the time we will query one more range than is necessary.
Instead we could either use the maxColumnsCount (if we really want to be conservative) or
add some fudge factor to the mean. Fudge factor that may maybe be based on the difference
between the min, mean and max columns count estimates.
> parallelize fetching rows for low-cardinality indexes
> -----------------------------------------------------
>                 Key: CASSANDRA-1337
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>            Priority: Minor
>             Fix For: 1.2
>         Attachments: 0001-CASSANDRA-1337-scan-concurrently-depending-on-num-rows.txt,
>   Original Estimate: 8h
>  Remaining Estimate: 8h
> currently, we read the indexed rows from the first node (in partitioner order); if that
does not have enough matching rows, we read the rows from the next, and so forth.
> we should use the statistics fom CASSANDRA-1155 to query multiple nodes in parallel,
such that we have a high chance of getting enough rows w/o having to do another round of queries
(but, if our estimate is incorrect, we do need to loop and do more rounds until we have enough
data or we have fetched from each node).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message