cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8272) 2ndary indexes can return stale data
Date Thu, 11 May 2017 11:09:05 GMT


Sergio Bossa commented on CASSANDRA-8272:

[~adelapena], I gave a first review pass and the approach looks sensible, so +1 on that.

Unfortunately, the problem is actually quite subtle and there are at least a couple cases
where it doesn't fully work.

First of all, when a {{LIMIT}} clause is provided, the query might return no results when
there actually are some valid ones: this is because the rows returned as a result of an "index
mismatch" are still counted against the limit (by {{CQLCounter}}), which means the coordinator
might end up with less valid rows than the requested limit, simply because some replicas returned
only mismatched rows. Here's a simple scenario with two nodes:
1) Write row {{key=1,index=1}}.
2) Write row {{key=2,index=1}}.
3) Shutdown node 2.
4) Delete column {{index}} from row {{key=1}}: the delete will go to node 1, while node 2
will miss it.
5) Restart node 2 (hints need to be disabled).
6) Query for {{index=1}}.
7) Node 1 will return the first row found, i.e. the "mismatched" one {{key=1}}.
8) Node 2 will return the "missed delete" with {{key=1}}.
9) Coordinator will merge/post-process the rows, realize there's a mismatch and return no
results, while it should have instead returned {{key=2}}.

Second, this patch doesn't fix filtering; while it's true we have a different issue for that
({{CASSANDRA-8273}}), and while we could argue filtering isn't exactly a form of indexing,
it is still used in conjunction with indexing, and fixing indexing just to have its results
invalidated when filtering is applied seems quite confusing to me.

In the end, I'd suggest the following:
1) Stick with the current approach! It's good and I do not think using special tombstones
would buy us anything.
2) Fix the first problem above.
3) Generalize the approach so we can fix filtering and any other indexing implementation (most
notably SASI).
4) To ease the burden of porting between versions, and given this is not a trivial bug fix
at all, I'd also suggest to only apply it to 3.11 onwards.


> 2ndary indexes can return stale data
> ------------------------------------
>                 Key: CASSANDRA-8272
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Andrés de la Peña
>             Fix For: 3.0.x
> When replica return 2ndary index results, it's possible for a single replica to return
a stale result and that result will be sent back to the user, potentially failing the CL contract.
> For instance, consider 3 replicas A, B and C, and the following situation:
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v text);
> CREATE INDEX ON test(v);
> INSERT INTO test(k, v) VALUES (0, 'foo');
> {noformat}
> with every replica up to date. Now, suppose that the following queries are done at {{QUORUM}}:
> {noformat}
> UPDATE test SET v = 'bar' WHERE k = 0;
> SELECT * FROM test WHERE v = 'foo';
> {noformat}
> then, if A and B acknowledge the insert but C respond to the read before having applied
the insert, then the now stale result will be returned (since C will return it and A or B
will return nothing).
> A potential solution would be that when we read a tombstone in the index (and provided
we make the index inherit the gcGrace of it's parent CF), instead of skipping that tombstone,
we'd insert in the result a corresponding range tombstone.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message