cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boris Yen (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3006) Enormous counter
Date Wed, 10 Aug 2011 03:33:27 GMT


Boris Yen commented on CASSANDRA-3006:

In order to make it easier to reproduce this issue, I document how I recreate this issue step
by step.

1. clean any thing that is inside /var/lib/cassandra on node

2. start cassandra on node

3. clean any thing that is inside /var/lib/cassnadra on node

4. modify the cassandra.yaml of and add as a seed.

5. start cassandra on node, I could see two node has formed a cluster, I also
double check that using nodetool.

6. on node, I use cassandra-cli: to connect, and execute
commands -> 

create keyspace test
with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options = [{datacenter1:2}];

create column family testCounter
    with column_type = Super
    and default_validation_class = CounterColumnType
    and replicate_on_write = true
    and comparator = BytesType
    and subcomparator = BytesType
    and comment = 'APP status information.';

7. use the test program to add the counter 1000 times. between each adding action the program
will pause 50 millisecond.

8. in the middle of the adding process, shut down the cassandra on node, (let's
say I shut down node when count is 200.). Because the test program changes the
consistency level to One when it encounters an exception (timeout exception to be exact),
the following adding actions will still be success.

9. wait for the overall adding process to complete. I saw "success counter: 999" due to one

10. use the cassandra-cli to connect to and and check the counter
value, the value is 1001 on both nodes. It shows 1001 because hector will retry when it encounters
the timeout exception. 

11. shutdown the cassandra on, wait for a few seconds, I saw "InetAddress /
is now dead" on node

12. after seeing "InetAddress / is now dead", restart the cassandra on node

13. check the counter again with cassandra-cli on both nodes, this time the counter should
no longer be 1001, it should be other weird number.

Hope someone else could recreate it by these steps.

> Enormous counter 
> -----------------
>                 Key: CASSANDRA-3006
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.3
>         Environment: ubuntu 10.04
>            Reporter: Boris Yen
>            Assignee: Sylvain Lebresne
> I have two-node cluster with the following keyspace and column family settings.
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions: 
> 	63fda700-c243-11e0-0000-2d03dcafebdf: [,]
> Keyspace: test:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:2]
>   Column Families:
>     ColumnFamily: testCounter (Super)
>     "APP status information."
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
> Then, I use a test program based on hector to add a counter column (testCounter[sc][column])
1000 times. In the middle the adding process, I intentional shut down the node
In addition to that, the test program is smart enough to switch the consistency level from
Quorum to One, so that the following adding actions would not fail. 
> After all the adding actions are done, I start the cassandra on, and I
use cassandra-cli to check if the counter is correct on both nodes, and I got a result 1001
which should be reasonable because hector will retry once. However, when I shut down
and after is aware of is down, I try to start the cassandra on again. Then, I check the counter again, this time I got a result 481387 which
is so wrong.
> I use 0.8.3 to reproduce this bug, but I think this also happens on 0.8.2 or before also.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message