cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robbie Strickland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes
Date Mon, 02 Feb 2015 16:53:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301452#comment-14301452
] 

Robbie Strickland commented on CASSANDRA-8717:
----------------------------------------------

[~iamaleksey] Have you looked at the patch?  There's barely anything to it, and yet it opens
up the door for guys like Stratio to plug in more advanced index implementations without breaking
anything (i.e. no need for their fork, which is a good thing).  Plus who knows when 3.0 will
go mainstream?  I think you should reconsider, or at least get some other input.

> Top-k queries with custom secondary indexes
> -------------------------------------------
>
>                 Key: CASSANDRA-8717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Andrés de la Peña
>            Assignee: Andrés de la Peña
>            Priority: Minor
>              Labels: 2i, secondary_index, sort, sorting, top-k
>             Fix For: 3.0
>
>         Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch
>
>
> As presented in [Cassandra Summit Europe 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M],
secondary indexes can be modified to support general top-k queries with minimum changes in
Cassandra codebase. This way, custom 2i implementations could provide relevance search, sorting
by columns, etc.
> Top-k queries retrieve the k best results for a certain query. That implies querying
the k best rows in each token range and then sort them in order to obtain the k globally best
rows. 
> For doing that, we propose two additional methods in class SecondaryIndexSearcher:
> {code:java}
> public boolean requiresFullScan(List<IndexExpression> clause)
> {
>     return false;
> }
> public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
> {
>     return rows;
> }
> {code}
> The first one indicates if a query performed in the index requires querying all the nodes
in the ring. It is necessary in top-k queries because we do not know which node are the best
results. The second method specifies how to sort all the partial node results according to
the query. 
> Then we add two similar methods to the class AbstractRangeCommand:
> {code:java}
>     this.searcher = Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
> public boolean requiresFullScan() {
>     return searcher == null ? false : searcher.requiresFullScan(rowFilter);
> }
> public List<Row> combine(List<Row> rows)
> {
>     return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, rows));
> }
> {code}
> Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as shown in
the attached patch.
> We think that the proposed approach provides very useful functionality with minimum impact
in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message