cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8505) Invalid results are returned while secondary index are being build
Date Wed, 11 Nov 2015 17:47:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000767#comment-15000767
] 

Sam Tunnicliffe commented on CASSANDRA-8505:
--------------------------------------------

I think the various index states can be reduced to a simple ready/not ready check. What's
more unless we intend to change the established behaviour fairly significantly, once an index
moves to a ready state it never moves back to being not ready. The only times when we modify
the status in the system table are when the index is removed (in which case we have no problem
with being able to query using it) or during a rebuild. In the latter case though, we probably
shouldn't reject queries (and we don't currently), as an index rebuild is incremental. That
is, we don't scrap the existing index tables and rebuild everything from scratch, just write
new index SSTables to supercede the old ones. So although it's certainly possible to get incorrect
results during a rebuild (because of missing/stale entries), the results only get more correct
as the rebuild progresses. Changing this so that all queries against that index return errors
until all rebuilds complete seems like a step backwards. It seems more reasonable to reject
queries until the initial build has been performed, as per the example in the description,
but this only requires a simple boolean to track state between instantiating/registering the
index and its initial build task completing (if one is required). 

It would be good to have some test coverage of this, although the best I could come up with
is a dtest which inserts many rows, then adds the index and queries immediately expecting
ReadFailureException, which is fairly lame and fragile.

A couple of points specific to the 3.0 patch:

* The fix for CASSANDRA-10595 has been lost. If an index doesn't register itself in {{createIndex}},
don't ask it for an initalization task, just set {{initialBuildTask == null}}. 
* {{SIM::reloadIndex}} has changed since the patch was created (due to CASSANDRA-10604) -
I think that no changes to this method are now required. I did notice though that the current
implementation actually makes a redundant call to {{getMetadataReloadTask}}, so if you could
fix that while you're here, that'd be great.

bq. Secondary index and their build/not build status are node-local. By consequence it is
not possible to know on a coordinator node if the index is fully build. It can be built on
the coordinator but still building on other nodes

For future reference on this point, we also have CASSANDRA-9967 which has a very similar intent.

> Invalid results are returned while secondary index are being build
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-8505
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8505
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>             Fix For: 2.2.x, 3.0.x
>
>
> If you request an index creation and then execute a query that use the index the results
returned might be invalid until the index is fully build. This is caused by the fact that
the table column will be marked as indexed before the index is ready.
> The following unit tests can be use to reproduce the problem:
> {code}
>     @Test
>     public void testIndexCreatedAfterInsert() throws Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, c int, primary key((a, b)))");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 0, 0);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 1, 1);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 2, 2);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 0, 3);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 1, 4);");
>         
>         createIndex("CREATE INDEX ON %s(b)");
>         
>         assertRows(execute("SELECT * FROM %s WHERE b = ?;", 1),
>                    row(0, 1, 1),
>                    row(1, 1, 4));
>     }
>     
>     @Test
>     public void testIndexCreatedBeforeInsert() throws Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, c int, primary key((a, b)))");
>         createIndex("CREATE INDEX ON %s(b)");
>         
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 0, 0);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 1, 1);");
>         execute("INSERT INTO %s (a, b, c) VALUES (0, 2, 2);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 0, 3);");
>         execute("INSERT INTO %s (a, b, c) VALUES (1, 1, 4);");
>         assertRows(execute("SELECT * FROM %s WHERE b = ?;", 1),
>                    row(0, 1, 1),
>                    row(1, 1, 4));
>     }
> {code}
> The first test will fail while the second will work. 
> In my opinion the first test should reject the request as invalid (as if the index was
not existing) until the index is fully build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message