cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12651) Failure in SecondaryIndexTest.testAllowFilteringOnPartitionKeyWithSecondaryIndex
Date Thu, 17 Nov 2016 18:09:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674407#comment-15674407
] 

Sam Tunnicliffe commented on CASSANDRA-12651:
---------------------------------------------

The root cause essentially what [~cam1982] described in CASSANDRA-12590 (specifically, [here|https://issues.apache.org/jira/browse/CASSANDRA-12590?focusedCommentId=15484088&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15484088]).

Index tables use {{LocalPartitioner/LocalToken}} and {{LocalToken}} holds a reference to value
object used to construct it. In this case, that value is a {{ByteBuffer}} representing a cell
in the base table (i.e. the indexed value). When {{offheap_buffers}} or {{offheap_objects}}
are in play (as they are in {{test/conf/cassandra.yaml}} on 3.X/trunk), this indexed cell
value may be backed by a {{DirectByteBuffer}} which is recycled sometime after the memtable
containing it is discarded. The reason this causes this particular test to fail is that the
min/max {{DecoratedKey}} for the index's sstables are also references (at least, they are
immediately after writing and until the table is next opened). The min/max keys are used on
the read path to construct the interval tree which determines which sstables are used for
a given read query. If one of these direct buffers gets recycled/cleaned/freed, this can corrupt
the sstable, causing the interval tree search to fail which results in sstables being skipped
incorrectly. 

I added some logging to the test which captures this by reporting some metadata about the
index sstable before running each query:

This log failing test run shows the sstable being corrupted before the 5th query is executed.
The token for key 5 changes from {{5}} to {{0}}, causing the interval tree search to fail
& sstable to be skipped.
{code}
INFO  [main] 2016-11-17 09:57:48,577 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-3faeee00acac11e691265959e4dbcb49/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(5, 00000005))] 
INFO  [main] 2016-11-17 09:57:48,578 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-3faeee00acac11e691265959e4dbcb49/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(5, 00000005))] 
INFO  [main] 2016-11-17 09:57:48,580 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-3faeee00acac11e691265959e4dbcb49/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(5, 00000005))] 
INFO  [main] 2016-11-17 09:57:48,581 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-3faeee00acac11e691265959e4dbcb49/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(5, 00000005))] 
INFO  [main] 2016-11-17 09:57:48,582 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-3faeee00acac11e691265959e4dbcb49/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(0, 00000005))] 
{code}
And here the 4th query failed. The sstable was actually corrupted before the 3rd query ran,
but that expects to filter all index results which masks the index lookup failure.
{code}
INFO  [main] 2016-11-17 10:06:03,061 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-666d7d30acad11e6bb81b1960637036c/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(5, 00000005))] 
INFO  [main] 2016-11-17 10:06:03,062 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-666d7d30acad11e6bb81b1960637036c/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(5, 00000005))] 
INFO  [main] 2016-11-17 10:06:03,064 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-666d7d30acad11e6bb81b1960637036c/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(0, 00000005))] 
INFO  [main] 2016-11-17 10:06:03,064 Index CFS live memtables: [(DataFile: /home/automaton/cassandra/build/test/cassandra/data:0/cql_test_keyspace/table_37-666d7d30acad11e6bb81b1960637036c/.v_idx_1/mc-1-big-Data.db,
Min: DecoratedKey(1, 00000001), Max: DecoratedKey(0, 00000005))] 
{code}

On CASSANDRA-12590, I suggested that this was probably only a problem for 2i tables, and advocated
against having {{LocalToken}} make an on-heap copy of the buffer. I still think that the problem
only affects indexes, but as the batchlog table is the only other user of {{LocalPartitioner}}
the cost should be minimal and it's much cleaner to have the on heap copy made in {{LocalToken}}.
Testing that locally with extra debug logging shows that the index sstables are no longer
holding references to the direct buffers, so I'm pretty sure this will fix the problem. I've
kicked off a jenkins job to run 200 iterations of the flaky test & if that reports 0 failures,
I'll be pretty confident this is the fix.

FTR, this isn't a problem in 2.2 as the insert into the index table is done using the original
cell from the write, which is then cloned (potentially moving offheap) before being inserted
into the base memtable.  


> Failure in SecondaryIndexTest.testAllowFilteringOnPartitionKeyWithSecondaryIndex
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12651
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12651
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Testing
>            Reporter: Joel Knighton
>            Assignee: Sam Tunnicliffe
>              Labels: test-failure
>
> This has failed with/without compression.
> Stacktrace:
> {code}
> junit.framework.AssertionFailedError: Got less rows than expected. Expected 2 but got
0
> 	at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:909)
> 	at org.apache.cassandra.cql3.validation.entities.SecondaryIndexTest.lambda$testAllowFilteringOnPartitionKeyWithSecondaryIndex$78(SecondaryIndexTest.java:1228)
> 	at org.apache.cassandra.cql3.validation.entities.SecondaryIndexTest$$Lambda$293/218688965.apply(Unknown
Source)
> 	at org.apache.cassandra.cql3.CQLTester.beforeAndAfterFlush(CQLTester.java:1215)
> 	at org.apache.cassandra.cql3.validation.entities.SecondaryIndexTest.testAllowFilteringOnPartitionKeyWithSecondaryIndex(SecondaryIndexTest.java:1218)
> {code}
> Examples:
> http://cassci.datastax.com/job/trunk_testall/1176/testReport/org.apache.cassandra.cql3.validation.entities/SecondaryIndexTest/testAllowFilteringOnPartitionKeyWithSecondaryIndex/
> http://cassci.datastax.com/job/trunk_testall/1176/testReport/org.apache.cassandra.cql3.validation.entities/SecondaryIndexTest/testAllowFilteringOnPartitionKeyWithSecondaryIndex_compression/
> http://cassci.datastax.com/job/trunk_testall/1219/testReport/org.apache.cassandra.cql3.validation.entities/SecondaryIndexTest/testAllowFilteringOnPartitionKeyWithSecondaryIndex/
> http://cassci.datastax.com/job/trunk_testall/1216/testReport/org.apache.cassandra.cql3.validation.entities/SecondaryIndexTest/testAllowFilteringOnPartitionKeyWithSecondaryIndex/
> http://cassci.datastax.com/job/trunk_testall/1208/testReport/org.apache.cassandra.cql3.validation.entities/SecondaryIndexTest/testAllowFilteringOnPartitionKeyWithSecondaryIndex/
> http://cassci.datastax.com/job/trunk_testall/1176/testReport/org.apache.cassandra.cql3.validation.entities/SecondaryIndexTest/testAllowFilteringOnPartitionKeyWithSecondaryIndex/
> http://cassci.datastax.com/job/trunk_testall/1175/testReport/org.apache.cassandra.cql3.validation.entities/SecondaryIndexTest/testAllowFilteringOnPartitionKeyWithSecondaryIndex/
> May or may not be related, but there's a test failure (index duplicate):
> http://cassci.datastax.com/view/Dev/view/carlyeks/job/carlyeks-ticket-11803-3.X-testall/lastCompletedBuild/testReport/org.apache.cassandra.index.internal/CassandraIndexTest/indexOnFirstClusteringColumn_compression/
> http://cassci.datastax.com/job/ifesdjeen-11803-test-fix-trunk-testall/1/testReport/junit/org.apache.cassandra.index.internal/CassandraIndexTest/indexOnFirstClusteringColumn_compression/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message