cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9459) SecondaryIndex API redesign
Date Wed, 19 Aug 2015 16:38:48 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703305#comment-14703305
] 

Sam Tunnicliffe edited comment on CASSANDRA-9459 at 8/19/15 4:38 PM:
---------------------------------------------------------------------

[~sbtourist] in response to your comments (sorry for the delay) :

bq. It seems we've lost CASSANDRA-9196.

This was necessary because of the fact that each Index defined in schema was automatically
registered with {{SecondaryIndexManager}}. So even if a particular custom index would not
participate in any indexing or search activity on a certain node, due to external configuration
or whatnot, its mere presence would mean that whenever new SSTables were loaded we would perform
an expensive, and possibly pointless iteration through them. This shouldn't happen anymore,
as the decision whether to register an index is now the responsibility of the index itself,
so it can make that choice based on whatever criteria is necessary.


bq. It would be useful to distinguish between a cleanup and a compaction at the Indexer level,
as indexes not backed by CFs will probably be do nothing during compaction.

{{SecondaryIndexManager.TransactionType}} now allows impls to distinguish between {{WRITE_TIME}},
{{COMPACTION}} and {{CLEANUP}} transactions.

bq. Cells#reconcile doesn't call Indexer#updateCell in case of counters, but what if a third-party
implementation wants to index them?

Indexes are not supported on counter columns directly. That said, the latest version changes
the way updates are collected by {{WriteTimeTransaction}} with the effect that counter columns
will be present in the Rows supplied to registered indexers.

bq. SIM#indexPartition seems to miss to invoke Indexer#finish.

Thanks, good catch.

On the subsequent comment regarding CASSANDRA-8717, I haven't had a chance yet but I'll dig
further into that shortly.




was (Author: beobal):
@sbtourist in response to your comments (sorry for the delay) :

bq. It seems we've lost CASSANDRA-9196.

This was necessary because of the fact that each Index defined in schema was automatically
registered with {{SecondaryIndexManager}}. So even if a particular custom index would not
participate in any indexing or search activity on a certain node, due to external configuration
or whatnot, its mere presence would mean that whenever new SSTables were loaded we would perform
an expensive, and possibly pointless iteration through them. This shouldn't happen anymore,
as the decision whether to register an index is now the responsibility of the index itself,
so it can make that choice based on whatever criteria is necessary.


bq. It would be useful to distinguish between a cleanup and a compaction at the Indexer level,
as indexes not backed by CFs will probably be do nothing during compaction.

{{SecondaryIndexManager.TransactionType}} now allows impls to distinguish between {{WRITE_TIME}},
{{COMPACTION}} and {{CLEANUP}} transactions.

bq. Cells#reconcile doesn't call Indexer#updateCell in case of counters, but what if a third-party
implementation wants to index them?

Indexes are not supported on counter columns directly. That said, the latest version changes
the way updates are collected by {{WriteTimeTransaction}} with the effect that counter columns
will be present in the Rows supplied to registered indexers.

bq. SIM#indexPartition seems to miss to invoke Indexer#finish.

Thanks, good catch.

On the subsequent comment regarding CASSANDRA-8717, I haven't had a chance yet but I'll dig
further into that shortly.



> SecondaryIndex API redesign
> ---------------------------
>
>                 Key: CASSANDRA-9459
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9459
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0 beta 1
>
>
> For some time now the index subsystem has been a pain point and in large part this is
due to the way that the APIs and principal classes have grown organically over the years.
It would be a good idea to conduct a wholesale review of the area and see if we can come up
with something a bit more coherent.
> A few starting points:
> * There's a lot in AbstractPerColumnSecondaryIndex & its subclasses which could be
pulled up into SecondaryIndexSearcher (note that to an extent, this is done in CASSANDRA-8099).
> * SecondayIndexManager is overly complex and several of its functions should be simplified/re-examined.
The handling of which columns are indexed and index selection on both the read and write paths
are somewhat dense and unintuitive.
> * The SecondaryIndex class hierarchy is rather convoluted and could use some serious
rework.
> There are a number of outstanding tickets which we should be able to roll into this higher
level one as subtasks (but I'll defer doing that until getting into the details of the redesign):
> * CASSANDRA-7771
> * CASSANDRA-8103
> * CASSANDRA-9041
> * CASSANDRA-4458
> * CASSANDRA-8505
> Whilst they're not hard dependencies, I propose that this be done on top of both CASSANDRA-8099
and CASSANDRA-6717. The former largely because the storage engine changes may facilitate a
friendlier index API, but also because of the changes to SIS mentioned above. As for 6717,
the changes to schema tables there will help facilitate CASSANDRA-7771.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message