cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2798) Repair Fails 0.8
Date Mon, 20 Jun 2011 18:38:48 GMT


Sylvain Lebresne commented on CASSANDRA-2798:

Ok, I'm happy to help tracking this down, but somehow I can't get it to reproduce.
I did try with the same number of nodes, same tokens, same exact column families definition
and I inserted 100,000 keys in each (with 1 column per key for test1 and 1 super column with
1 column per key for test2). Once inserted, I flushed (so that there is some sstable), I killed
node3, cleaned all data/commit log, restarted node3 and ran nodetool repair on node3. It correctly
succeeded. At the end of repair, the load of node3 was twice the size it should (which is
expected since both node will have repaired it -- which in itself may not be the more efficient
solution but this is not the debate here) and the load had slightly increased in the two other
nodes (I'll have to check the actual reason but again, not the issue at end). But after a
compact, everything was back to it's normal size.

Now I made different tries with different settings of numbers of super column/column per key,
but I can't test everything. Maybe the size of the value play a role too. In any case, if
you can reproduce so simply, would you mind attaching the script your using to "fill both
CF" as well as as much details about the step you use.

Last thing, when I'm talking about the load of a node, I'm talking about the load value as
reported by nodetool ring. If you were checking the actual file on disk, please make sure
to restart the cluster after the nodetool compact to make sure this is not just compacted
files not yet being deleted.

Overall, right now I'm focused on the 'nodetool compact after repair doesn't make the load
goes down' which is the really weird one.

> Repair Fails 0.8
> ----------------
>                 Key: CASSANDRA-2798
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: David Arena
>            Assignee: Sylvain Lebresne
> I am seeing a fatal problem in the new 0.8
> Im running a 3 node cluster with a replication_factor of 3..
> On Node 3.. If i 
> # kill -9 cassandra-pid
> # rm -rf "All data & logs"
> # start cassandra
> # nodetool -h "node-3-ip" repair
> The whole cluster become duplicated..
> * e.g Before 
> node 1 -> 2.65GB
> node 2 -> 2.65GB
> node 3 -> 2.65GB
> * e.g After
> node 1 -> 5.3GB
> node 2 -> 5.3GB
> node 3 -> 7.95GB
> -> nodetool repair, never ends (96 hours +), however there is no streams running,
nor any cpu or disk activity..
> -> Manually killing the repair and restarting does not help.. Restarting the server/cassandra
does not help..
> -> nodetool flush,compact,cleanup all complete, but do not help...
> This is not occuring in 0.7.6.. I have come to the conclusion this is a Major 0.8 issue
> Running: CentOS 5.6, JDK 1.6.0_26

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message