cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés de la Peña (JIRA) <>
Subject [jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes
Date Thu, 30 Apr 2015 16:29:10 GMT


Andrés de la Peña commented on CASSANDRA-8717:

I have uploaded a new version of the patch with the changes that need to be addressed before
including this in 2.1.

As you suggested, I have added a boolean argument named {{trace}} to {{SIS#highestSelectivityPredicate}}
to indicate whether or not the tracing event should be emitted. It is set to true by {{SIS#search}},
and false by {{SIS#highestSelectivityIndex}}.

To avoid duplication in {{SecondaryIndexManager}}, now the search method calls to {{getHighestSelectivityIndexSearcher}}.

I have modified {{SIS#postReconciliationProcessing}} JavaDoc trying to make it clear that
it happens on the coordinator node.

I hope you find it OK.

> Top-k queries with custom secondary indexes
> -------------------------------------------
>                 Key: CASSANDRA-8717
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Andrés de la Peña
>            Assignee: Andrés de la Peña
>            Priority: Minor
>              Labels: 2i, secondary_index, sort, sorting, top-k
>             Fix For: 3.x
>         Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch, 0002-Add-support-for-top-k-queries-in-2i.patch,
0003-Add-support-for-top-k-queries-in-2i.patch, 0004-Add-support-for-top-k-queries-in-2i.patch
> As presented in [Cassandra Summit Europe 2014|],
secondary indexes can be modified to support general top-k queries with minimum changes in
Cassandra codebase. This way, custom 2i implementations could provide relevance search, sorting
by columns, etc.
> Top-k queries retrieve the k best results for a certain query. That implies querying
the k best rows in each token range and then sort them in order to obtain the k globally best
> For doing that, we propose two additional methods in class SecondaryIndexSearcher:
> {code:java}
> public boolean requiresFullScan(List<IndexExpression> clause)
> {
>     return false;
> }
> public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
> {
>     return rows;
> }
> {code}
> The first one indicates if a query performed in the index requires querying all the nodes
in the ring. It is necessary in top-k queries because we do not know which node are the best
results. The second method specifies how to sort all the partial node results according to
the query. 
> Then we add two similar methods to the class AbstractRangeCommand:
> {code:java}
>     this.searcher =;
> public boolean requiresFullScan() {
>     return searcher == null ? false : searcher.requiresFullScan(rowFilter);
> }
> public List<Row> combine(List<Row> rows)
> {
>     return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, rows));
> }
> {code}
> Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as shown in
the attached patch.
> We think that the proposed approach provides very useful functionality with minimum impact
in current codebase.

This message was sent by Atlassian JIRA

View raw message