incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <r...@twitter.com>
Subject Re: SSTable corruption blocking compaction and scrub can't fix it
Date Fri, 17 Jun 2011 17:06:56 GMT
Even without lsof, you should be able to get the data from /proc/$pid

-ryan

On Fri, Jun 17, 2011 at 5:08 AM, Dominic Williams
<dwilliams@system7.co.uk> wrote:
> Unfortunately I shutdown that node and anyway lsof wasn't installed.
> But $ulimit gives
> unlimited
>
> On 17 June 2011 13:00, Sylvain Lebresne <sylvain@datastax.com> wrote:
>>
>> On Fri, Jun 17, 2011 at 1:51 PM, Dominic Williams
>> <dwilliams@system7.co.uk> wrote:
>> > As far as scrub goes that could be it. I'm already running unlimited
>> > file
>> > handles though so ulimit not answer unfortunately
>>
>> Are you sure ? How many file descriptors are open on the system when
>> you get that
>> scrub exception ? If you really have unlimited file handles and
>> Cassandra uses them
>> all, then we have a more serious problem than just CASSANDRA-2669.
>>
>> --
>> Sylvain
>>
>> > Dominic
>> > On 17 June 2011 12:12, Sylvain Lebresne <sylvain@datastax.com> wrote:
>> >>
>> >> Scrub apparently dies because it cannot acquire a file descriptor.
>> >> Scrub
>> >> does
>> >> not correctly closes files
>> >> (https://issues.apache.org/jira/browse/CASSANDRA-2669)
>> >> so that may be part of why that happens. However, a simple fix is
>> >> probably
>> >> to
>> >> raise up the file descriptor limit.
>> >>
>> >> --
>> >> Sylvain
>> >>
>> >>
>> >> On Fri, Jun 17, 2011 at 11:31 AM, Dominic Williams
>> >> <dwilliams@system7.co.uk> wrote:
>> >> > Hi all,
>> >> > Anyone experiencing this..?
>> >> > I noticed one of my 7.6-2 nodes had inexplicable and consistently
>> >> > high
>> >> > cpu
>> >> > usage. Checking the log I found that there was a some kind of SSTable
>> >> > corruption that was stopping a bunch of files from compacting (first
>> >> > trace
>> >> > copied below).
>> >> > I then tried scrub (before anyone asks it had been run when
>> >> > upgrading,
>> >> > but I
>> >> > was just trying to find out if this fixed it) but that failed with
>> >> > "Too
>> >> > many
>> >> > open files" even though node should be using mem map (second trace
>> >> > copied
>> >> > below).
>> >> > Current plan is to snapshot, decommission & bootstrap but not what
>> >> > should be
>> >> > happening. Only factor I can think of that might have caused problem
>> >> > is
>> >> > we
>> >> > are currently running huge GCGraceSeconds because we can't run repair
>> >> > (we
>> >> > are hanging on for 8.1 as CASSANDRA-2280 causes repair to consume all
>> >> > disk
>> >> > space and fail)
>> >> >  INFO [NonPeriodicTasks:1] 2011-06-16 22:43:45,232 SSTable.java (line
>> >> > 147)
>> >> > Deleted /var/opt/cassandra/data/FightMyMonster/Users_CisIndex-f-5922
>> >> >  INFO [MutationStage:7989] 2011-06-16 22:49:57,852
>> >> > ColumnFamilyStore.java
>> >> > (line 1065) Enqueuing flush of Memtable-UserMonsters@423965881(34471
>> >> > bytes,
>> >> > 322112 operations)
>> >> >  INFO [FlushWriter:1176] 2011-06-16 22:49:57,917 Memtable.java (line
>> >> > 157)
>> >> > Writing Memtable-UserMonsters@423965881(34471 bytes, 322112
>> >> > operations)
>> >> >  INFO [FlushWriter:1176] 2011-06-16 22:50:01,087 Memtable.java (line
>> >> > 172)
>> >> > Completed flushing
>> >> > /var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7614-Data.db
>> >> > (7250239
>> >> > bytes)
>> >> >  INFO [CompactionExecutor:1] 2011-06-16 22:50:01,090
>> >> > CompactionManager.java
>> >> > (line 395) Compacting
>> >> >
>> >> >
>> >> > [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader(
>> >> >
>> >> >
>> >> > path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra
>> >> >
>> >> >
>> >> > /data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use
>> >> >
>> >> >
>> >> > rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db
>> >> >
>> >> >
>> >> > '),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/
>> >> >
>> >> >
>> >> > var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F
>> >> >
>> >> >
>> >> > ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste
>> >> >
>> >> >
>> >> > rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa
>> >> >
>> >> >
>> >> > bleReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableReader(path='/var/opt
>> >> >
>> >> >
>> >> > /cassandra/data/FightMyMonster/UserMonsters-f-7028-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7030-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyM
>> >> >
>> >> >
>> >> > onster/UserMonsters-f-7032-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-70
>> >> >
>> >> >
>> >> > 36-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-Data.db'),SSTableRead
>> >> >
>> >> >
>> >> > er(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7042-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(path='/var/opt/cassan
>> >> >
>> >> >
>> >> > dra/data/FightMyMonster/UserMonsters-f-7046-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7048-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/
>> >> >
>> >> >
>> >> > UserMonsters-f-7050-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db')]
>> >> >  INFO [NonPeriodicTasks:1] 2011-06-16 22:50:02,399
>> >> > ColumnFamilyStore.java
>> >> > (line 1065) Enqueuing flush of Memtable-Monsters@562380513(497561
>> >> > bytes,
>> >> > 13860 operations)
>> >> >  INFO [FlushWriter:1176] 2011-06-16 22:50:02,399 Memtable.java (line
>> >> > 157)
>> >> > Writing Memtable-Monsters@562380513(497561 bytes, 13860 operations)
>> >> >  INFO [FlushWriter:1176] 2011-06-16 22:50:02,580 Memtable.java (line
>> >> > 172)
>> >> > Completed flushing
>> >> > /var/opt/cassandra/data/FightMyMonster/Monsters-f-7160-Data.db
>> >> > (471100
>> >> > bytes)
>> >> > ERROR [CompactionExecutor:1] 2011-06-16 22:50:37,776
>> >> > AbstractCassandraDaemon.java (line 114) Fatal exception in thread
>> >> > Thread[CompactionExecutor:1,1,main]
>> >> > java.io.IOError: java.io.IOException: Corrupt (negative) value length
>> >> > encountered
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
>> >> >         at
>> >> >
>> >> >
>> >> > java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>> >> >         at
>> >> >
>> >> >
>> >> > java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
>> >> >         at
>> >> >
>> >> > org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:154)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:110)
>> >> >        at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:45)
>> >> >        at
>> >> >
>> >> >
>> >> > org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74)
>> >> >         at
>> >> >
>> >> >
>> >> > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>> >> >        at
>> >> >
>> >> >
>> >> > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:448)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
>> >> >         at
>> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >> >         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >> >         at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >> >         at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >> >         at java.lang.Thread.run(Thread.java:619)
>> >> > Caused by: java.io.IOException: Corrupt (negative) value length
>> >> > encountered
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
>> >> >         ... 26 more
>> >> >  INFO [CompactionExecutor:1] 2011-06-16 22:50:37,794
>> >> > CompactionManager.java
>> >> > (line 395) Compacting
>> >> >
>> >> >
>> >> > [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7157-Data.db'),SSTableReader(path
>> >> >
>> >> >
>> >> > ='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7158-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7159-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightM
>> >> > yMonster/Monsters-f-7160-Data.db')]
>> >> >
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7554-Data.db')
>> >> > complete: 244 rows in new sstable and 0 empty (tombstoned) rows
>> >> > dropped
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db')
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db')
>> >> > complete: 493 rows in new sstable and 0 empty (tombstoned) rows
>> >> > dropped
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db')
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,225
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db')
>> >> > complete: 275 rows in new sstable and 0 empty (tombstoned) rows
>> >> > dropped
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,226
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db')
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:28,383
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db')
>> >> > complete: 297 rows in new sstable and 0 empty (tombstoned) rows
>> >> > dropped
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:28,384
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db')
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db')
>> >> > complete: 347 rows in new sstable and 0 empty (tombstoned) rows
>> >> > dropped
>> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db')
>> >> > ERROR [CompactionExecutor:1] 2011-06-17 00:46:29,374
>> >> > AbstractCassandraDaemon.java (line 114) Fatal exception in thread
>> >> > Thread[CompactionExecutor:1,1,main]
>> >> > java.io.FileNotFoundException:
>> >> >
>> >> > /var/opt/cassandra/data/FightMyMonster/UserMonsters-tmp-f-7823-Data.db
>> >> > (Too
>> >> > many open files)
>> >> >         at java.io.RandomAccessFile.open(Native Method)
>> >> >         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:113)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:78)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:2243)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.maybeCreateWriter(CompactionManager.java:794)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:534)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56)
>> >> >         at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195)
>> >> >         at
>> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >> >         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >> >         at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >> >         at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >> >         at java.lang.Thread.run(Thread.java:619)
>> >> > Dominic
>> >
>> >
>
>

Mime
View raw message