cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rudolf van der Leeden <rudolf.vanderlee...@scoreloop.com>
Subject StackOverflowError with repair after bulkloading SSTables
Date Fri, 20 Jul 2012 11:25:36 GMT
Hi,

I'm currently testing the restore of a Cassandra 1.1.2 snapshot.

The steps to reproduce the problem:

 - snapshot a 3-node production cluster (1.1.2) with RF=3 and LCS (leveled compaction) ==>
8GB data/node
 - create a new 3-node cluster (node1,2,3)
 - stop node1 / copy data (SSTables) from the snapshot (just one node) / start node1
 - Cassandra is opening 1185 SSTable files (*-hd-XXXX),  pending compaction tasks: 247
 - before Cassandra is starting compactions RUN:  nodetool repair -pr

The error messages in system.log :

 INFO [AntiEntropySessions:1] 2012-07-20 10:53:16,743 AntiEntropyService.java (line 666) [repair
#1c59b930-d259-11e1-0000-a0b0843ee1fe] new session: will sync /10.241.65.232, /10.54.26.250,
/10.251.33.166 on range (113427455640312821154458202477256070485,0] for highscores.[highscore]
 INFO [AntiEntropySessions:1] 2012-07-20 10:53:16,747 AntiEntropyService.java (line 871) [repair
#1c59b930-d259-11e1-0000-a0b0843ee1fe] requesting merkle trees for highscore (to [/10.54.26.250,
/10.251.33.166, /10.241.65.232])
 INFO [AntiEntropyStage:1] 2012-07-20 10:53:17,085 AntiEntropyService.java (line 206) [repair
#1c59b930-d259-11e1-0000-a0b0843ee1fe] Received merkle tree for highscore from /10.54.26.250
 INFO [AntiEntropyStage:1] 2012-07-20 10:53:17,104 AntiEntropyService.java (line 206) [repair
#1c59b930-d259-11e1-0000-a0b0843ee1fe] Received merkle tree for highscore from /10.251.33.166
ERROR [ValidationExecutor:1] 2012-07-20 10:53:17,865 AbstractCassandraDaemon.java (line 134)
Exception in thread Thread[ValidationExecutor:1,1,main]
java.lang.StackOverflowError
        at com.google.common.collect.Sets$1.iterator(Sets.java:578)    ....  (repeating 1024
times) 

The repair command does not return. 
The repair command increases the Active/Pending counters of "AntiEntropySessions" in tpstats.

The counters never go back to 0.

After some time compaction starts as usual w/o problems.

Am I doing something wrong? The error is bound to LCS. No problem with STCS.
There is plenty of space in Java HEAP (7G) and on the disk (1.7TB). 
RAM is 15G and SWAP is 20G. This is an Amazon m1.xlarge instance with Ubuntu/Lucid Linux.

Thanks for any hints or help,
Rudolf VanderLeeden
Scoreloop/RIM


Mime
View raw message