cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8272) 2ndary indexes can return stale data
Date Fri, 12 May 2017 08:44:04 GMT


Sylvain Lebresne commented on CASSANDRA-8272:

bq. I disagree here: if filtering is applied on top of index results, you'll still get wrong

It's possible, but not at all guarantee since the index and filtering will apply to different
columns. But that's almost beside the point as my point is that even solving only the indexing
will still avoid bugs for some people (at the very least the ones that don't use filtering
over indexing at all), so if we can't get agreement on how to fix the filtering, I don't think
we should hold the indexing fix.

But mostly, I just want us to have the _discussion_ around filtering in CASSANDRA-8273 to
avoid mixing things up, but If we can agree on moving filtering server-side there quickly,
then I'm totally fine doing that and the indexing in a single patch if we prefer.

bq. what about fixing filtering (that is, moving to coordinator-side filtering) only when
indexes are present?

Well, that kind of already get into the territory of whether we're ok with moving filtering
coordinator-side. In fact, I don't think having filtering applied on top of indexing or not
change in any way that discussion. Again though, I'm not at all against fixing both issues,
I just prefer discussing the two different (though related) problems separately.

bq. But we can still provide some API (i.e. the {{isSatisfiedBy()}} you mentioned) they can

If you're making a general point, then sure. Otherwise, I'm not sure what else you have in
mind (and as I said I don't see what more we can do) so feel free to share.

bq. Mmmhhhh ... clunky. And error prone as the 3.X code would be probably untestable. Couldn't
the replica detect the coordinator version and return results accordingly?

We can do anything, but everything version-related is currently wired to the messaging protocol
version, which can't currently change in minor versions, so we'd have to rely on the version
exchanged through gossip in a way we never have, so with risks associated (typically potential
races between when we actually get that version and where we use it). Plus it  would mean
quite a bit of (fairly ugly) changes to pass the version where we need it. All that in a minor
release. I doubt it's a good idea in practice in this context.

On the flip-side, we do have quite a bit of prior experience adding stuffs to minor releases
to fix future major upgrade. I don't disagree it's clunky, mind you, but better the devil
you know...

I don't see why it would be untestable though: we can test the added filtering doesn't break
anything in 3.x and we can totally test upgrades.

bq. for index using custom indexes: we'd need to have them implement the {{CustomExpression#isSatistiedBy}}

I was a bit too quick here, it's actually not that simple, because {{CustomExpression}} are
created directly from the parser and don't depend on whatever index use them, so we can't
have them override/implement it. That said, we do know which index it's use with when we create
one so we could change things a bit so index do provide us with their own concrete implementation
{{CustomExpression}}, it's just a tiny bit more involved that I made is sound to be.

> 2ndary indexes can return stale data
> ------------------------------------
>                 Key: CASSANDRA-8272
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Andrés de la Peña
>             Fix For: 3.0.x
> When replica return 2ndary index results, it's possible for a single replica to return
a stale result and that result will be sent back to the user, potentially failing the CL contract.
> For instance, consider 3 replicas A, B and C, and the following situation:
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v text);
> CREATE INDEX ON test(v);
> INSERT INTO test(k, v) VALUES (0, 'foo');
> {noformat}
> with every replica up to date. Now, suppose that the following queries are done at {{QUORUM}}:
> {noformat}
> UPDATE test SET v = 'bar' WHERE k = 0;
> SELECT * FROM test WHERE v = 'foo';
> {noformat}
> then, if A and B acknowledge the insert but C respond to the read before having applied
the insert, then the now stale result will be returned (since C will return it and A or B
will return nothing).
> A potential solution would be that when we read a tombstone in the index (and provided
we make the index inherit the gcGrace of it's parent CF), instead of skipping that tombstone,
we'd insert in the result a corresponding range tombstone.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message