cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominic Williams <dwilli...@system7.co.uk>
Subject Re: SSTable corruption blocking compaction and scrub can't fix it
Date Fri, 17 Jun 2011 18:19:43 GMT
Yeah that would get the count (although I don't think you can see filenames
- or maybe I just don't know how). Unfortunately that node was shut down. I
then tried restarting with storage port 7001 to isolate as was quite toxic
for performance of cluster but it now get's OOM on restart.

If it's really worth knowing I'll do the "remove commit logs and feed them
back in slowly on successive restarts" routine to get it up again and then
run scrub to see what happens. I'm sure the result will be the same.

If there are any committers out there who want to poke around I can also
arrange to give password as node will be decommissioned anyway? You can then
take a look at the SSTable corruption too etc

On 17 June 2011 18:06, Ryan King <ryan@twitter.com> wrote:

> Even without lsof, you should be able to get the data from /proc/$pid
>
> -ryan
>
> On Fri, Jun 17, 2011 at 5:08 AM, Dominic Williams
> <dwilliams@system7.co.uk> wrote:
> > Unfortunately I shutdown that node and anyway lsof wasn't installed.
> > But $ulimit gives
> > unlimited
> >
> > On 17 June 2011 13:00, Sylvain Lebresne <sylvain@datastax.com> wrote:
> >>
> >> On Fri, Jun 17, 2011 at 1:51 PM, Dominic Williams
> >> <dwilliams@system7.co.uk> wrote:
> >> > As far as scrub goes that could be it. I'm already running unlimited
> >> > file
> >> > handles though so ulimit not answer unfortunately
> >>
> >> Are you sure ? How many file descriptors are open on the system when
> >> you get that
> >> scrub exception ? If you really have unlimited file handles and
> >> Cassandra uses them
> >> all, then we have a more serious problem than just CASSANDRA-2669.
> >>
> >> --
> >> Sylvain
> >>
> >> > Dominic
> >> > On 17 June 2011 12:12, Sylvain Lebresne <sylvain@datastax.com> wrote:
> >> >>
> >> >> Scrub apparently dies because it cannot acquire a file descriptor.
> >> >> Scrub
> >> >> does
> >> >> not correctly closes files
> >> >> (https://issues.apache.org/jira/browse/CASSANDRA-2669)
> >> >> so that may be part of why that happens. However, a simple fix is
> >> >> probably
> >> >> to
> >> >> raise up the file descriptor limit.
> >> >>
> >> >> --
> >> >> Sylvain
> >> >>
> >> >>
> >> >> On Fri, Jun 17, 2011 at 11:31 AM, Dominic Williams
> >> >> <dwilliams@system7.co.uk> wrote:
> >> >> > Hi all,
> >> >> > Anyone experiencing this..?
> >> >> > I noticed one of my 7.6-2 nodes had inexplicable and consistently
> >> >> > high
> >> >> > cpu
> >> >> > usage. Checking the log I found that there was a some kind of
> SSTable
> >> >> > corruption that was stopping a bunch of files from compacting
> (first
> >> >> > trace
> >> >> > copied below).
> >> >> > I then tried scrub (before anyone asks it had been run when
> >> >> > upgrading,
> >> >> > but I
> >> >> > was just trying to find out if this fixed it) but that failed
with
> >> >> > "Too
> >> >> > many
> >> >> > open files" even though node should be using mem map (second trace
> >> >> > copied
> >> >> > below).
> >> >> > Current plan is to snapshot, decommission & bootstrap but
not what
> >> >> > should be
> >> >> > happening. Only factor I can think of that might have caused
> problem
> >> >> > is
> >> >> > we
> >> >> > are currently running huge GCGraceSeconds because we can't run
> repair
> >> >> > (we
> >> >> > are hanging on for 8.1 as CASSANDRA-2280 causes repair to consume
> all
> >> >> > disk
> >> >> > space and fail)
> >> >> >  INFO [NonPeriodicTasks:1] 2011-06-16 22:43:45,232 SSTable.java
> (line
> >> >> > 147)
> >> >> > Deleted
> /var/opt/cassandra/data/FightMyMonster/Users_CisIndex-f-5922
> >> >> >  INFO [MutationStage:7989] 2011-06-16 22:49:57,852
> >> >> > ColumnFamilyStore.java
> >> >> > (line 1065) Enqueuing flush of Memtable-UserMonsters@423965881
> (34471
> >> >> > bytes,
> >> >> > 322112 operations)
> >> >> >  INFO [FlushWriter:1176] 2011-06-16 22:49:57,917 Memtable.java
> (line
> >> >> > 157)
> >> >> > Writing Memtable-UserMonsters@423965881(34471 bytes, 322112
> >> >> > operations)
> >> >> >  INFO [FlushWriter:1176] 2011-06-16 22:50:01,087 Memtable.java
> (line
> >> >> > 172)
> >> >> > Completed flushing
> >> >> > /var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7614-Data.db
> >> >> > (7250239
> >> >> > bytes)
> >> >> >  INFO [CompactionExecutor:1] 2011-06-16 22:50:01,090
> >> >> > CompactionManager.java
> >> >> > (line 395) Compacting
> >> >> >
> >> >> >
> >> >> >
> [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader(
> >> >> >
> >> >> >
> >> >> >
> path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra
> >> >> >
> >> >> >
> >> >> >
> /data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use
> >> >> >
> >> >> >
> >> >> >
> rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db
> >> >> >
> >> >> >
> >> >> >
> '),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/
> >> >> >
> >> >> >
> >> >> >
> var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F
> >> >> >
> >> >> >
> >> >> >
> ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste
> >> >> >
> >> >> >
> >> >> >
> rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa
> >> >> >
> >> >> >
> >> >> >
> bleReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableReader(path='/var/opt
> >> >> >
> >> >> >
> >> >> >
> /cassandra/data/FightMyMonster/UserMonsters-f-7028-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7030-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyM
> >> >> >
> >> >> >
> >> >> >
> onster/UserMonsters-f-7032-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-70
> >> >> >
> >> >> >
> >> >> >
> 36-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-Data.db'),SSTableRead
> >> >> >
> >> >> >
> >> >> >
> er(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7042-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(path='/var/opt/cassan
> >> >> >
> >> >> >
> >> >> >
> dra/data/FightMyMonster/UserMonsters-f-7046-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7048-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/
> >> >> >
> >> >> >
> >> >> >
> UserMonsters-f-7050-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db')]
> >> >> >  INFO [NonPeriodicTasks:1] 2011-06-16 22:50:02,399
> >> >> > ColumnFamilyStore.java
> >> >> > (line 1065) Enqueuing flush of Memtable-Monsters@562380513(497561
> >> >> > bytes,
> >> >> > 13860 operations)
> >> >> >  INFO [FlushWriter:1176] 2011-06-16 22:50:02,399 Memtable.java
> (line
> >> >> > 157)
> >> >> > Writing Memtable-Monsters@562380513(497561 bytes, 13860
> operations)
> >> >> >  INFO [FlushWriter:1176] 2011-06-16 22:50:02,580 Memtable.java
> (line
> >> >> > 172)
> >> >> > Completed flushing
> >> >> > /var/opt/cassandra/data/FightMyMonster/Monsters-f-7160-Data.db
> >> >> > (471100
> >> >> > bytes)
> >> >> > ERROR [CompactionExecutor:1] 2011-06-16 22:50:37,776
> >> >> > AbstractCassandraDaemon.java (line 114) Fatal exception in thread
> >> >> > Thread[CompactionExecutor:1,1,main]
> >> >> > java.io.IOError: java.io.IOException: Corrupt (negative) value
> length
> >> >> > encountered
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
> >> >> >         at
> >> >> >
> >> >> >
> org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:154)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:110)
> >> >> >        at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:45)
> >> >> >        at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> >> >> >        at
> >> >> >
> >> >> >
> >> >> >
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:448)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
> >> >> >         at
> >> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >> >> >         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >> >         at java.lang.Thread.run(Thread.java:619)
> >> >> > Caused by: java.io.IOException: Corrupt (negative) value length
> >> >> > encountered
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
> >> >> >         ... 26 more
> >> >> >  INFO [CompactionExecutor:1] 2011-06-16 22:50:37,794
> >> >> > CompactionManager.java
> >> >> > (line 395) Compacting
> >> >> >
> >> >> >
> >> >> >
> [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7157-Data.db'),SSTableReader(path
> >> >> >
> >> >> >
> >> >> >
> ='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7158-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7159-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightM
> >> >> > yMonster/Monsters-f-7160-Data.db')]
> >> >> >
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756
> >> >> > CompactionManager.java
> >> >> > (line 652) Scrub of
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7554-Data.db')
> >> >> > complete: 244 rows in new sstable and 0 empty (tombstoned) rows
> >> >> > dropped
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756
> >> >> > CompactionManager.java
> >> >> > (line 511) Scrubbing
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db')
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115
> >> >> > CompactionManager.java
> >> >> > (line 652) Scrub of
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db')
> >> >> > complete: 493 rows in new sstable and 0 empty (tombstoned) rows
> >> >> > dropped
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115
> >> >> > CompactionManager.java
> >> >> > (line 511) Scrubbing
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db')
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,225
> >> >> > CompactionManager.java
> >> >> > (line 652) Scrub of
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db')
> >> >> > complete: 275 rows in new sstable and 0 empty (tombstoned) rows
> >> >> > dropped
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:26,226
> >> >> > CompactionManager.java
> >> >> > (line 511) Scrubbing
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db')
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:28,383
> >> >> > CompactionManager.java
> >> >> > (line 652) Scrub of
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db')
> >> >> > complete: 297 rows in new sstable and 0 empty (tombstoned) rows
> >> >> > dropped
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:28,384
> >> >> > CompactionManager.java
> >> >> > (line 511) Scrubbing
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db')
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300
> >> >> > CompactionManager.java
> >> >> > (line 652) Scrub of
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db')
> >> >> > complete: 347 rows in new sstable and 0 empty (tombstoned) rows
> >> >> > dropped
> >> >> >  INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300
> >> >> > CompactionManager.java
> >> >> > (line 511) Scrubbing
> >> >> >
> >> >> >
> >> >> >
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db')
> >> >> > ERROR [CompactionExecutor:1] 2011-06-17 00:46:29,374
> >> >> > AbstractCassandraDaemon.java (line 114) Fatal exception in thread
> >> >> > Thread[CompactionExecutor:1,1,main]
> >> >> > java.io.FileNotFoundException:
> >> >> >
> >> >> >
> /var/opt/cassandra/data/FightMyMonster/UserMonsters-tmp-f-7823-Data.db
> >> >> > (Too
> >> >> > many open files)
> >> >> >         at java.io.RandomAccessFile.open(Native Method)
> >> >> >         at
> java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:113)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:78)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:2243)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.CompactionManager.maybeCreateWriter(CompactionManager.java:794)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:534)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195)
> >> >> >         at
> >> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >> >> >         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >> >         at
> >> >> >
> >> >> >
> >> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >> >         at java.lang.Thread.run(Thread.java:619)
> >> >> > Dominic
> >> >
> >> >
> >
> >
>

Mime
View raw message