cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-3006) Enormous counter
Date Wed, 10 Aug 2011 13:25:27 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-3006:
----------------------------------------

    Attachment: 3006.patch

Thanks, that helps a lot.

The problem is due to the RowMutation optimization that keeps the serialized data received
on the wire to use in the commit log. This is wrong for counters because we use the deserialization
of the RowMutation to clean the delta on the counter columns.

At least in the script to reproduce, this was a hint problem. The first node was creating
a hint for the second one and was storing it himself (and the value of this hints was not
cleared because of the bug above).

Patch attached to fix this. This basically disable the optimization for RowMutation that contains
counter. I don't pretend this is particularly clean but I don't see any other simple solution.

> Enormous counter 
> -----------------
>
>                 Key: CASSANDRA-3006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.3
>         Environment: ubuntu 10.04
>            Reporter: Boris Yen
>            Assignee: Sylvain Lebresne
>         Attachments: 3006.patch
>
>
> I have two-node cluster with the following keyspace and column family settings.
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions: 
> 	63fda700-c243-11e0-0000-2d03dcafebdf: [172.17.19.151, 172.17.19.152]
> Keyspace: test:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:2]
>   Column Families:
>     ColumnFamily: testCounter (Super)
>     "APP status information."
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
> Then, I use a test program based on hector to add a counter column (testCounter[sc][column])
1000 times. In the middle the adding process, I intentional shut down the node 172.17.19.152.
In addition to that, the test program is smart enough to switch the consistency level from
Quorum to One, so that the following adding actions would not fail. 
> After all the adding actions are done, I start the cassandra on 172.17.19.152, and I
use cassandra-cli to check if the counter is correct on both nodes, and I got a result 1001
which should be reasonable because hector will retry once. However, when I shut down 172.17.19.151
and after 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the cassandra on
172.17.19.151 again. Then, I check the counter again, this time I got a result 481387 which
is so wrong.
> I use 0.8.3 to reproduce this bug, but I think this also happens on 0.8.2 or before also.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message