cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9459) SecondaryIndex API redesign
Date Fri, 21 Aug 2015 15:18:03 GMT


Sylvain Lebresne commented on CASSANDRA-9459:

bq. The metric isn't so much how many of the original expressions can an Index handle, but
the amount of filtering left to do on the results of the index lookup.

Doesn't really change my point :). My point is that the filtering post-index query is relatively
cheap, and what we want to favor first and foremost is to minimize the number of results returned
by the index. If an index A returns 10 rows with 3 more expressions to check and another index
B returns 1000 rows with only 1 expression to check, you want to choose A every single time,
not B (unless of course B is able to return those 1000 rows more than 100x more quickly that
A takes to return its 10 rows but we have absolutely no metric regarding that). I wouldn't
mind future-proofing per-se, but I'm saying that considering the number of expressions left
by an index _before_ considering the estimate of the number of rows it returns is just a bad
heuristic, now or in the future.

bq. if we agree this needs more consideration

Per what's above, I think it does.

bq. I'd prefer to keep register separate if you don't object too strongly.

I don't.

bq. isn't it conceivable that a custom index could radically transform a filter into one containing
an entirely disjoint set of expressions from the original?

Sure, why not. Still, if the point is that the filter can radically change, then "Reduced"
is still a misleading name. What about something like {{getPostIndexQueryFilter}}?

Regarding the {{Optional}} return of that method, I still don't like it (I wouldn't like a
{{null}} return either), because it basically feels we're adding a meaning to the method (whether
the filter is handled at all by the index) which is not really implied by the method name.
In other words, I think the clean way would be to have a separate boolean {{isHandled()}}
method, but I understand this would duplicate work and we want to avoid this. So anyway, not
a big deal, let's stick with the {{Optional}}.

bq. I don't quite get what you mean by "it will insert the cells instead of removing them
for instance", as the tombstone has no value there's nothing we can remove

Well, internally a tombstone has a value, it's an empty byte buffer :) (so in practice, having
a tombstone in a row passed to {{insertRow()}} would have triggered the insertion of an index
entry, albeit a broken entry).

Anyway, I really did just mean that we should skip tombstones as is currently done in the
{{SecondaryIndexManager.Updater}} implementations but wasn't done in the patch. And while
I see you've added it to {{indexCell}}, we need it in {{removeCell}} too (nitpick: I would
add the {{isLive}} test as a "or" with the {{cell == null}} test).

bq. I also think we {{removeRow}} is still needed for cleanup & compaction

Sure, you're right of course, I brainfarted.

bq. I'm not sure what you mean about 'idxName', looks like it's used to me

Meant that it wasn't used in the equality test, which was {{indexer.getIndexName().equals(cfName)}}
and not, as I would have expected {{indexer.getIndexName().equals(idxName)}}. Anyway, your
new version doesn't have this problem so I'm good.

bq. abstracting the registry-ness from {{SIM}} makes it much easier to use a lightweight implementation
for tests

Fair enough. I guess if we keep {{register}} then that makes a bit more sense. Wasn't a big
deal anyway, it's not illogical at all, I just wanted to be sure I wasn't missing something.

> SecondaryIndex API redesign
> ---------------------------
>                 Key: CASSANDRA-9459
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0 beta 1
> For some time now the index subsystem has been a pain point and in large part this is
due to the way that the APIs and principal classes have grown organically over the years.
It would be a good idea to conduct a wholesale review of the area and see if we can come up
with something a bit more coherent.
> A few starting points:
> * There's a lot in AbstractPerColumnSecondaryIndex & its subclasses which could be
pulled up into SecondaryIndexSearcher (note that to an extent, this is done in CASSANDRA-8099).
> * SecondayIndexManager is overly complex and several of its functions should be simplified/re-examined.
The handling of which columns are indexed and index selection on both the read and write paths
are somewhat dense and unintuitive.
> * The SecondaryIndex class hierarchy is rather convoluted and could use some serious
> There are a number of outstanding tickets which we should be able to roll into this higher
level one as subtasks (but I'll defer doing that until getting into the details of the redesign):
> * CASSANDRA-7771
> * CASSANDRA-8103
> * CASSANDRA-9041
> * CASSANDRA-4458
> * CASSANDRA-8505
> Whilst they're not hard dependencies, I propose that this be done on top of both CASSANDRA-8099
and CASSANDRA-6717. The former largely because the storage engine changes may facilitate a
friendlier index API, but also because of the changes to SIS mentioned above. As for 6717,
the changes to schema tables there will help facilitate CASSANDRA-7771.

This message was sent by Atlassian JIRA

View raw message