cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7465) DecoratedKey assertion error on reads
Date Sat, 28 Jun 2014 05:25:24 GMT


Jason Brown commented on CASSANDRA-7465:

Yup, the SSNTI.key's BB is definitely getting corrupted. I added a quick capture of the key's
raw string in the ctor:

    public final DecoratedKey key;
    private final String originalKey;  // <- new member field

    public SSTableNamesIterator(SSTableReader sstable, DecoratedKey key, SortedSet<CellName>
        assert columns != null;
        this.sstable = sstable;
        this.columns = columns;
        this.key = key;
        originalKey = key.toString();

Then print out the originalKey on assert fail:

            assert keyInDisk.equals(key) : String.format("%s != %s in %s, originalKey = %s",
keyInDisk, key, file.getPath(), originalKey);

Here's a sample assert error output:

java.lang.AssertionError: DecoratedKey(-6380411336414528869, 30303033313546384530) != DecoratedKey(-6380411336414528869,
30303036433842334234) in /u/sdd/cassandra-jasobrown/data/Keyspace1/Standard1-6ab9bd90fe3b11e385edff96c2ef2fd6/Keyspace1-Standard1-ka-109-Data.db,
originalKey = DecoratedKey(-6380411336414528869, 30303033313546384530)

The first key printed, 30303033313546384530, is the one from disk (that was pointed to by
the buffer from the key cache); note that it matches the original key (last key printed).
The key in the middle, 30303036433842334234, is the corrupted value that occurred after the
lookup to the key cache, yet before the assert. Thus, I believe my hypothesis is correct.

As to how to fix the problem, I'm not sure if there's a quick solution to having a systematic
way to detect refs to the key's buffer, but we can quite readily fix the known problems: key
cache and, as [~benedict] correctly pointed out, row cache.

> DecoratedKey assertion error on reads
> -------------------------------------
>                 Key: CASSANDRA-7465
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 3 nodes
> Oracle Linux Server 6.3 
> kernel ver 2.6.39
>            Reporter: Jason Brown
>            Priority: Blocker
>             Fix For: 2.1.0
> Getting the following exception when running read stress:
> {code}WARN  [SharedPool-Worker-31] 2014-06-27 21:25:51,391
- Uncaught exception on thread Thread[SharedPool-Worker-31,10,main]: {}
> java.lang.AssertionError: DecoratedKey(-5397116645141815707, 30303031393143364639) !=
DecoratedKey(-5397116645141815707, 30303031343439443233) in /u/sdd/cassandra-jasobrown/data/Keyspace1/Standard1-6ab9bd90fe3b11e385edff96c2ef2fd6/Keyspace1-Standard1-ka-73-Data.db
> 	at
> 	at org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(
> 	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(
> 	at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(
> 	at org.apache.cassandra.db.CollationController.collectTimeOrderedData(
> 	at org.apache.cassandra.db.CollationController.getTopLevelColumns(
> 	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(
> 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(
> 	at org.apache.cassandra.db.Keyspace.getRow( ~[apache-cassandra-2.1.0-rc2-SNAPSHOT.jar:2.1.0-rc2-SNAPSHOT]
> 	at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(
> 	at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(
> 	at org.apache.cassandra.service.StorageProxy$
> 	at java.util.concurrent.Executors$ ~[na:1.7.0_13]
> 	at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$
> 	at [apache-cassandra-2.1.0-rc2-SNAPSHOT.jar:2.1.0-rc2-SNAPSHOT]
> 	at [na:1.7.0_13]
> {code}
> I have a three node cluster, which I populate with the following stress command:
> {code}
> cassandra-stress write n=60000000 -schema replication\(factor\=2\) -key populate=1..60000000
-rate threads=42  -mode native prepared cql3 -port native=9043 thrift=9161  -node athena06-a,athena06-b,athena06-c
-col n=fixed\(21\) size=exp\(11..42\)
> {code}
> Then I run the read stress:
> {code}
> cassandra-stress read n=100000000  -key dist=extr\(1..600000000,2\)  -mode native prepared
cql3 -port native=9043 thrift=9161  -node athena06-b,athena06-c,athena06-a -col n=fixed\(21\)
-rate threads=64
> {code}
> The above exception occurs semi-frequently (several to ~50 times a minute, but seems
to depend on amount of data in cluster - anecdotal evidence only).

This message was sent by Atlassian JIRA

View raw message