cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominic Williams <dwilli...@system7.co.uk>
Subject SSTable corruption blocking compaction and scrub can't fix it
Date Fri, 17 Jun 2011 09:31:06 GMT
Hi all,

Anyone experiencing this..?

I noticed one of my 7.6-2 nodes had inexplicable and consistently high cpu
usage. Checking the log I found that there was a some kind of SSTable
corruption that was stopping a bunch of files from compacting (first trace
copied below).

I then tried scrub (before anyone asks it had been run when upgrading, but I
was just trying to find out if this fixed it) but that failed with "Too many
open files" even though node should be using mem map (second trace copied
below).

Current plan is to snapshot, decommission & bootstrap but not what should be
happening. Only factor I can think of that might have caused problem is we
are currently running huge GCGraceSeconds because we can't run repair (we
are hanging on for 8.1 as CASSANDRA-2280 causes repair to consume all disk
space and fail)

 INFO [NonPeriodicTasks:1] 2011-06-16 22:43:45,232 SSTable.java (line 147)
Deleted /var/opt/cassandra/data/FightMyMonster/Users_CisIndex-f-5922
 INFO [MutationStage:7989] 2011-06-16 22:49:57,852 ColumnFamilyStore.java
(line 1065) Enqueuing flush of Memtable-UserMonsters@423965881(34471 bytes,
322112 operations)
 INFO [FlushWriter:1176] 2011-06-16 22:49:57,917 Memtable.java (line 157)
Writing Memtable-UserMonsters@423965881(34471 bytes, 322112 operations)
 INFO [FlushWriter:1176] 2011-06-16 22:50:01,087 Memtable.java (line 172)
Completed flushing
/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7614-Data.db (7250239
bytes)
 INFO [CompactionExecutor:1] 2011-06-16 22:50:01,090 CompactionManager.java
(line 395) Compacting
[SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader(
path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra
/data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use
rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db
'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/
var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F
ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste
rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa
bleReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableReader(path='/var/opt
/cassandra/data/FightMyMonster/UserMonsters-f-7028-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7030-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyM
onster/UserMonsters-f-7032-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-70
36-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-Data.db'),SSTableRead
er(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7042-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(path='/var/opt/cassan
dra/data/FightMyMonster/UserMonsters-f-7046-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7048-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/
UserMonsters-f-7050-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db')]
 INFO [NonPeriodicTasks:1] 2011-06-16 22:50:02,399 ColumnFamilyStore.java
(line 1065) Enqueuing flush of Memtable-Monsters@562380513(497561 bytes,
13860 operations)
 INFO [FlushWriter:1176] 2011-06-16 22:50:02,399 Memtable.java (line 157)
Writing Memtable-Monsters@562380513(497561 bytes, 13860 operations)
 INFO [FlushWriter:1176] 2011-06-16 22:50:02,580 Memtable.java (line 172)
Completed flushing
/var/opt/cassandra/data/FightMyMonster/Monsters-f-7160-Data.db (471100
bytes)
ERROR [CompactionExecutor:1] 2011-06-16 22:50:37,776
AbstractCassandraDaemon.java (line 114) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.io.IOError: java.io.IOException: Corrupt (negative) value length
encountered
        at
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
        at
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
        at
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
        at
java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
        at
java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
        at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
        at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
        at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
        at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
        at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
        at
org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
        at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:154)
        at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:110)
       at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:45)
       at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74)
        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
       at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
        at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
        at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
        at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:448)
        at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
        at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Corrupt (negative) value length encountered
        at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315)
        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99)
        at
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
        ... 26 more
 INFO [CompactionExecutor:1] 2011-06-16 22:50:37,794 CompactionManager.java
(line 395) Compacting
[SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7157-Data.db'),SSTableReader(path
='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7158-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7159-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightM
yMonster/Monsters-f-7160-Data.db')]


 INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756 CompactionManager.java
(line 652) Scrub of
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7554-Data.db')
complete: 244 rows in new sstable and 0 empty (tombstoned) rows dropped
 INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756 CompactionManager.java
(line 511) Scrubbing
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db')
 INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115 CompactionManager.java
(line 652) Scrub of
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db')
complete: 493 rows in new sstable and 0 empty (tombstoned) rows dropped
 INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115 CompactionManager.java
(line 511) Scrubbing
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db')
 INFO [CompactionExecutor:1] 2011-06-17 00:46:26,225 CompactionManager.java
(line 652) Scrub of
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db')
complete: 275 rows in new sstable and 0 empty (tombstoned) rows dropped
 INFO [CompactionExecutor:1] 2011-06-17 00:46:26,226 CompactionManager.java
(line 511) Scrubbing
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db')
 INFO [CompactionExecutor:1] 2011-06-17 00:46:28,383 CompactionManager.java
(line 652) Scrub of
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db')
complete: 297 rows in new sstable and 0 empty (tombstoned) rows dropped
 INFO [CompactionExecutor:1] 2011-06-17 00:46:28,384 CompactionManager.java
(line 511) Scrubbing
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db')
 INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300 CompactionManager.java
(line 652) Scrub of
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db')
complete: 347 rows in new sstable and 0 empty (tombstoned) rows dropped
 INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300 CompactionManager.java
(line 511) Scrubbing
SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db')
ERROR [CompactionExecutor:1] 2011-06-17 00:46:29,374
AbstractCassandraDaemon.java (line 114) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.io.FileNotFoundException:
/var/opt/cassandra/data/FightMyMonster/UserMonsters-tmp-f-7823-Data.db (Too
many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
        at
org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:113)
        at
org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:78)
        at
org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:2243)
        at
org.apache.cassandra.db.CompactionManager.maybeCreateWriter(CompactionManager.java:794)
        at
org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:534)
        at
org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56)
        at
org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

Dominic

Mime
View raw message