Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 278C64C69 for ; Fri, 17 Jun 2011 18:20:35 +0000 (UTC) Received: (qmail 50381 invoked by uid 500); 17 Jun 2011 18:20:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 50360 invoked by uid 500); 17 Jun 2011 18:20:32 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 50347 invoked by uid 99); 17 Jun 2011 18:20:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 18:20:32 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dwilliams@system7.co.uk designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 18:20:24 +0000 Received: by iwn39 with SMTP id 39so2968208iwn.31 for ; Fri, 17 Jun 2011 11:20:03 -0700 (PDT) Received: by 10.231.41.4 with SMTP id m4mr2093205ibe.125.1308334803111; Fri, 17 Jun 2011 11:20:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.20.2 with HTTP; Fri, 17 Jun 2011 11:19:43 -0700 (PDT) In-Reply-To: References: From: Dominic Williams Date: Fri, 17 Jun 2011 19:19:43 +0100 Message-ID: Subject: Re: SSTable corruption blocking compaction and scrub can't fix it To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015176f0e3e2fd13204a5ec6fe5 X-Virus-Checked: Checked by ClamAV on apache.org --0015176f0e3e2fd13204a5ec6fe5 Content-Type: text/plain; charset=ISO-8859-1 Yeah that would get the count (although I don't think you can see filenames - or maybe I just don't know how). Unfortunately that node was shut down. I then tried restarting with storage port 7001 to isolate as was quite toxic for performance of cluster but it now get's OOM on restart. If it's really worth knowing I'll do the "remove commit logs and feed them back in slowly on successive restarts" routine to get it up again and then run scrub to see what happens. I'm sure the result will be the same. If there are any committers out there who want to poke around I can also arrange to give password as node will be decommissioned anyway? You can then take a look at the SSTable corruption too etc On 17 June 2011 18:06, Ryan King wrote: > Even without lsof, you should be able to get the data from /proc/$pid > > -ryan > > On Fri, Jun 17, 2011 at 5:08 AM, Dominic Williams > wrote: > > Unfortunately I shutdown that node and anyway lsof wasn't installed. > > But $ulimit gives > > unlimited > > > > On 17 June 2011 13:00, Sylvain Lebresne wrote: > >> > >> On Fri, Jun 17, 2011 at 1:51 PM, Dominic Williams > >> wrote: > >> > As far as scrub goes that could be it. I'm already running unlimited > >> > file > >> > handles though so ulimit not answer unfortunately > >> > >> Are you sure ? How many file descriptors are open on the system when > >> you get that > >> scrub exception ? If you really have unlimited file handles and > >> Cassandra uses them > >> all, then we have a more serious problem than just CASSANDRA-2669. > >> > >> -- > >> Sylvain > >> > >> > Dominic > >> > On 17 June 2011 12:12, Sylvain Lebresne wrote: > >> >> > >> >> Scrub apparently dies because it cannot acquire a file descriptor. > >> >> Scrub > >> >> does > >> >> not correctly closes files > >> >> (https://issues.apache.org/jira/browse/CASSANDRA-2669) > >> >> so that may be part of why that happens. However, a simple fix is > >> >> probably > >> >> to > >> >> raise up the file descriptor limit. > >> >> > >> >> -- > >> >> Sylvain > >> >> > >> >> > >> >> On Fri, Jun 17, 2011 at 11:31 AM, Dominic Williams > >> >> wrote: > >> >> > Hi all, > >> >> > Anyone experiencing this..? > >> >> > I noticed one of my 7.6-2 nodes had inexplicable and consistently > >> >> > high > >> >> > cpu > >> >> > usage. Checking the log I found that there was a some kind of > SSTable > >> >> > corruption that was stopping a bunch of files from compacting > (first > >> >> > trace > >> >> > copied below). > >> >> > I then tried scrub (before anyone asks it had been run when > >> >> > upgrading, > >> >> > but I > >> >> > was just trying to find out if this fixed it) but that failed with > >> >> > "Too > >> >> > many > >> >> > open files" even though node should be using mem map (second trace > >> >> > copied > >> >> > below). > >> >> > Current plan is to snapshot, decommission & bootstrap but not what > >> >> > should be > >> >> > happening. Only factor I can think of that might have caused > problem > >> >> > is > >> >> > we > >> >> > are currently running huge GCGraceSeconds because we can't run > repair > >> >> > (we > >> >> > are hanging on for 8.1 as CASSANDRA-2280 causes repair to consume > all > >> >> > disk > >> >> > space and fail) > >> >> > INFO [NonPeriodicTasks:1] 2011-06-16 22:43:45,232 SSTable.java > (line > >> >> > 147) > >> >> > Deleted > /var/opt/cassandra/data/FightMyMonster/Users_CisIndex-f-5922 > >> >> > INFO [MutationStage:7989] 2011-06-16 22:49:57,852 > >> >> > ColumnFamilyStore.java > >> >> > (line 1065) Enqueuing flush of Memtable-UserMonsters@423965881 > (34471 > >> >> > bytes, > >> >> > 322112 operations) > >> >> > INFO [FlushWriter:1176] 2011-06-16 22:49:57,917 Memtable.java > (line > >> >> > 157) > >> >> > Writing Memtable-UserMonsters@423965881(34471 bytes, 322112 > >> >> > operations) > >> >> > INFO [FlushWriter:1176] 2011-06-16 22:50:01,087 Memtable.java > (line > >> >> > 172) > >> >> > Completed flushing > >> >> > /var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7614-Data.db > >> >> > (7250239 > >> >> > bytes) > >> >> > INFO [CompactionExecutor:1] 2011-06-16 22:50:01,090 > >> >> > CompactionManager.java > >> >> > (line 395) Compacting > >> >> > > >> >> > > >> >> > > [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader( > >> >> > > >> >> > > >> >> > > path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra > >> >> > > >> >> > > >> >> > > /data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use > >> >> > > >> >> > > >> >> > > rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db > >> >> > > >> >> > > >> >> > > '),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/ > >> >> > > >> >> > > >> >> > > var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F > >> >> > > >> >> > > >> >> > > ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste > >> >> > > >> >> > > >> >> > > rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa > >> >> > > >> >> > > >> >> > > bleReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableReader(path='/var/opt > >> >> > > >> >> > > >> >> > > /cassandra/data/FightMyMonster/UserMonsters-f-7028-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7030-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyM > >> >> > > >> >> > > >> >> > > onster/UserMonsters-f-7032-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-70 > >> >> > > >> >> > > >> >> > > 36-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-Data.db'),SSTableRead > >> >> > > >> >> > > >> >> > > er(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7042-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(path='/var/opt/cassan > >> >> > > >> >> > > >> >> > > dra/data/FightMyMonster/UserMonsters-f-7046-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7048-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/ > >> >> > > >> >> > > >> >> > > UserMonsters-f-7050-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db')] > >> >> > INFO [NonPeriodicTasks:1] 2011-06-16 22:50:02,399 > >> >> > ColumnFamilyStore.java > >> >> > (line 1065) Enqueuing flush of Memtable-Monsters@562380513(497561 > >> >> > bytes, > >> >> > 13860 operations) > >> >> > INFO [FlushWriter:1176] 2011-06-16 22:50:02,399 Memtable.java > (line > >> >> > 157) > >> >> > Writing Memtable-Monsters@562380513(497561 bytes, 13860 > operations) > >> >> > INFO [FlushWriter:1176] 2011-06-16 22:50:02,580 Memtable.java > (line > >> >> > 172) > >> >> > Completed flushing > >> >> > /var/opt/cassandra/data/FightMyMonster/Monsters-f-7160-Data.db > >> >> > (471100 > >> >> > bytes) > >> >> > ERROR [CompactionExecutor:1] 2011-06-16 22:50:37,776 > >> >> > AbstractCassandraDaemon.java (line 114) Fatal exception in thread > >> >> > Thread[CompactionExecutor:1,1,main] > >> >> > java.io.IOError: java.io.IOException: Corrupt (negative) value > length > >> >> > encountered > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227) > >> >> > at > >> >> > > >> >> > > >> >> > > java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493) > >> >> > at > >> >> > > >> >> > > >> >> > > java.util.concurrent.ConcurrentSkipListMap.(ConcurrentSkipListMap.java:1443) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201) > >> >> > at > >> >> > > >> >> > > org.apache.cassandra.io.PrecompactedRow.(PrecompactedRow.java:78) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:154) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:110) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:45) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74) > >> >> > at > >> >> > > >> >> > > >> >> > > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) > >> >> > at > >> >> > > >> >> > > >> >> > > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:448) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94) > >> >> > at > >> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >> >> > at > >> >> > > >> >> > > >> >> > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> >> > at > >> >> > > >> >> > > >> >> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> >> > at java.lang.Thread.run(Thread.java:619) > >> >> > Caused by: java.io.IOException: Corrupt (negative) value length > >> >> > encountered > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248) > >> >> > ... 26 more > >> >> > INFO [CompactionExecutor:1] 2011-06-16 22:50:37,794 > >> >> > CompactionManager.java > >> >> > (line 395) Compacting > >> >> > > >> >> > > >> >> > > [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7157-Data.db'),SSTableReader(path > >> >> > > >> >> > > >> >> > > ='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7158-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Monsters-f-7159-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightM > >> >> > yMonster/Monsters-f-7160-Data.db')] > >> >> > > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756 > >> >> > CompactionManager.java > >> >> > (line 652) Scrub of > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7554-Data.db') > >> >> > complete: 244 rows in new sstable and 0 empty (tombstoned) rows > >> >> > dropped > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:24,756 > >> >> > CompactionManager.java > >> >> > (line 511) Scrubbing > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db') > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115 > >> >> > CompactionManager.java > >> >> > (line 652) Scrub of > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7446-Data.db') > >> >> > complete: 493 rows in new sstable and 0 empty (tombstoned) rows > >> >> > dropped > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:26,115 > >> >> > CompactionManager.java > >> >> > (line 511) Scrubbing > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db') > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:26,225 > >> >> > CompactionManager.java > >> >> > (line 652) Scrub of > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7018-Data.db') > >> >> > complete: 275 rows in new sstable and 0 empty (tombstoned) rows > >> >> > dropped > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:26,226 > >> >> > CompactionManager.java > >> >> > (line 511) Scrubbing > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db') > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:28,383 > >> >> > CompactionManager.java > >> >> > (line 652) Scrub of > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7580-Data.db') > >> >> > complete: 297 rows in new sstable and 0 empty (tombstoned) rows > >> >> > dropped > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:28,384 > >> >> > CompactionManager.java > >> >> > (line 511) Scrubbing > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db') > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300 > >> >> > CompactionManager.java > >> >> > (line 652) Scrub of > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7574-Data.db') > >> >> > complete: 347 rows in new sstable and 0 empty (tombstoned) rows > >> >> > dropped > >> >> > INFO [CompactionExecutor:1] 2011-06-17 00:46:29,300 > >> >> > CompactionManager.java > >> >> > (line 511) Scrubbing > >> >> > > >> >> > > >> >> > > SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db') > >> >> > ERROR [CompactionExecutor:1] 2011-06-17 00:46:29,374 > >> >> > AbstractCassandraDaemon.java (line 114) Fatal exception in thread > >> >> > Thread[CompactionExecutor:1,1,main] > >> >> > java.io.FileNotFoundException: > >> >> > > >> >> > > /var/opt/cassandra/data/FightMyMonster/UserMonsters-tmp-f-7823-Data.db > >> >> > (Too > >> >> > many open files) > >> >> > at java.io.RandomAccessFile.open(Native Method) > >> >> > at > java.io.RandomAccessFile.(RandomAccessFile.java:212) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.util.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:113) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.io.sstable.SSTableWriter.(SSTableWriter.java:78) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.ColumnFamilyStore.createCompactionWriter(ColumnFamilyStore.java:2243) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.CompactionManager.maybeCreateWriter(CompactionManager.java:794) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:534) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56) > >> >> > at > >> >> > > >> >> > > >> >> > > org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195) > >> >> > at > >> >> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >> >> > at > >> >> > > >> >> > > >> >> > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> >> > at > >> >> > > >> >> > > >> >> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> >> > at java.lang.Thread.run(Thread.java:619) > >> >> > Dominic > >> > > >> > > > > > > --0015176f0e3e2fd13204a5ec6fe5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yeah that would get the count (although I don't think you can see filen= ames - or maybe I just don't know how). Unfortunately that node was shu= t down. I then tried restarting with storage port 7001 to isolate as was qu= ite toxic for performance of cluster but it now get's OOM on restart.
If it's really worth knowing I'll do the "remov= e commit logs and feed them back in slowly on successive restarts" rou= tine to get it up again and then run scrub to see what happens. I'm sur= e the result will be the same.

If there are any committers out there who want to poke = around I can also arrange to give password as node will be decommissioned a= nyway? You can then take a look at the SSTable corruption too etc

On 17 June 2011 18:06, Ryan = King <ryan@twitter= .com> wrote:
Even without lsof, you should be able to get the data from /proc/$pid

-ryan

On Fri, Jun 17, 2011 at 5:08 AM, Dominic Williams
<dwilliams@system7.co.uk> wrote:
> Unfortunately I shutdown that node and anyway lsof wasn't installe= d.
> But $ulimit gives
> unlimited
>
> On 17 June 2011 13:00, Sylvain Lebresne <sylvain@datastax.com> wrote:
>>
>> On Fri, Jun 17, 2011 at 1:51 PM, Dominic Williams
>> <dwilliams@system7.c= o.uk> wrote:
>> > As far as scrub goes that could be it. I'm already runnin= g unlimited
>> > file
>> > handles though so ulimit not answer unfortunately
>>
>> Are you sure ? How many file descriptors are open on the system wh= en
>> you get that
>> scrub exception ? If you really have unlimited file handles and >> Cassandra uses them
>> all, then we have a more serious problem than just CASSANDRA-2669.=
>>
>> --
>> Sylvain
>>
>> > Dominic
>> > On 17 June 2011 12:12, Sylvain Lebresne <sylvain@datastax.com> wrote:
>> >>
>> >> Scrub apparently dies because it cannot acquire a file de= scriptor.
>> >> Scrub
>> >> does
>> >> not correctly closes files
>> >> (https://issues.apache.org/jira/browse/CASSANDRA-= 2669)
>> >> so that may be part of why that happens. However, a simpl= e fix is
>> >> probably
>> >> to
>> >> raise up the file descriptor limit.
>> >>
>> >> --
>> >> Sylvain
>> >>
>> >>
>> >> On Fri, Jun 17, 2011 at 11:31 AM, Dominic Williams
>> >> <dwilliams@= system7.co.uk> wrote:
>> >> > Hi all,
>> >> > Anyone experiencing this..?
>> >> > I noticed one of my 7.6-2 nodes had inexplicable and= consistently
>> >> > high
>> >> > cpu
>> >> > usage. Checking the log I found that there was a som= e kind of SSTable
>> >> > corruption that was stopping a bunch of files from c= ompacting (first
>> >> > trace
>> >> > copied below).
>> >> > I then tried scrub (before anyone asks it had been r= un when
>> >> > upgrading,
>> >> > but I
>> >> > was just trying to find out if this fixed it) but th= at failed with
>> >> > "Too
>> >> > many
>> >> > open files" even though node should be using me= m map (second trace
>> >> > copied
>> >> > below).
>> >> > Current plan is to snapshot, decommission & boot= strap but not what
>> >> > should be
>> >> > happening. Only factor I can think of that might hav= e caused problem
>> >> > is
>> >> > we
>> >> > are currently running huge GCGraceSeconds because we= can't run repair
>> >> > (we
>> >> > are hanging on for 8.1 as CASSANDRA-2280 causes repa= ir to consume all
>> >> > disk
>> >> > space and fail)
>> >> > =A0INFO [NonPeriodicTasks:1] 2011-06-16 22:43:45,232= SSTable.java (line
>> >> > 147)
>> >> > Deleted /var/opt/cassandra/data/FightMyMonster/Users= _CisIndex-f-5922
>> >> > =A0INFO [MutationStage:7989] 2011-06-16 22:49:57,852=
>> >> > ColumnFamilyStore.java
>> >> > (line 1065) Enqueuing flush of Memtable-UserMonsters= @423965881(34471
>> >> > bytes,
>> >> > 322112 operations)
>> >> > =A0INFO [FlushWriter:1176] 2011-06-16 22:49:57,917 M= emtable.java (line
>> >> > 157)
>> >> > Writing Memtable-UserMonsters@423965881(34471 bytes,= 322112
>> >> > operations)
>> >> > =A0INFO [FlushWriter:1176] 2011-06-16 22:50:01,087 M= emtable.java (line
>> >> > 172)
>> >> > Completed flushing
>> >> > /var/opt/cassandra/data/FightMyMonster/UserMonsters-= f-7614-Data.db
>> >> > (7250239
>> >> > bytes)
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-16 22:50:01,0= 90
>> >> > CompactionManager.java
>> >> > (line 395) Compacting
>> >> >
>> >> >
>> >> > [SSTableReader(path=3D'/var/opt/cassandra/data/F= ightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader(
>> >> >
>> >> >
>> >> > path=3D'/var/opt/cassandra/data/FightMyMonster/U= serMonsters-f-6994-Data.db'),SSTableReader(path=3D'/var/opt/cassand= ra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path= =3D'/var/opt/cassandra
>> >> >
>> >> >
>> >> > /data/FightMyMonster/UserMonsters-f-6996-Data.db'= ;),SSTableReader(path=3D'/var/opt/cassandra/data/FightMyMonster/UserMon= sters-f-6998-Data.db'),SSTableReader(path=3D'/var/opt/cassandra/dat= a/FightMyMonster/Use
>> >> >
>> >> >
>> >> > rMonsters-f-7000-Data.db'),SSTableReader(path=3D= '/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'= ;),SSTableReader(path=3D'/var/opt/cassandra/data/FightMyMonster/UserMon= sters-f-7004-Data.db
>> >> >
>> >> >
>> >> > '),SSTableReader(path=3D'/var/opt/cassandra/= data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path=3D= '/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'= ;),SSTableReader(path=3D'/
>> >> >
>> >> >
>> >> > var/opt/cassandra/data/FightMyMonster/UserMonsters-f= -7010-Data.db'),SSTableReader(path=3D'/var/opt/cassandra/data/Fight= MyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path=3D'/var/= opt/cassandra/data/F
>> >> >
>> >> >
>> >> > ightMyMonster/UserMonsters-f-7014-Data.db'),SSTa= bleReader(path=3D'/var/opt/cassandra/data/FightMyMonster/UserMonsters-f= -7016-Data.db'),SSTableReader(path=3D'/var/opt/cassandra/data/Fight= MyMonster/UserMonste
>> >> >
>> >> >
>> >> > rs-f-7018-Data.db'),SSTableReader(path=3D'/v= ar/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTa= bleReader(path=3D'/var/opt/cassandra/data/FightMyMonster/UserMonsters-f= -7022-Data.db'),SSTa
>> >> >
>> >> >
>> >> > bleReader(path=3D'/var/opt/cassandra/data/FightM= yMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path=3D'/var/o= pt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableR= eader(path=3D'/var/opt
>> >> >
>> >> >
>> >> > /cassandra/data/FightMyMonster/UserMonsters-f-7028-D= ata.db'),SSTableReader(path=3D'/var/opt/cassandra/data/FightMyMonst= er/UserMonsters-f-7030-Data.db'),SSTableReader(path=3D'/var/opt/cas= sandra/data/FightMyM
>> >> >
>> >> >
>> >> > onster/UserMonsters-f-7032-Data.db'),SSTableRead= er(path=3D'/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-D= ata.db'),SSTableReader(path=3D'/var/opt/cassandra/data/FightMyMonst= er/UserMonsters-f-70
>> >> >
>> >> >
>> >> > 36-Data.db'),SSTableReader(path=3D'/var/opt/= cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableRead= er(path=3D'/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-D= ata.db'),SSTableRead
>> >> >
>> >> >
>> >> > er(path=3D'/var/opt/cassandra/data/FightMyMonste= r/UserMonsters-f-7042-Data.db'),SSTableReader(path=3D'/var/opt/cass= andra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(p= ath=3D'/var/opt/cassan
>> >> >
>> >> >
>> >> > dra/data/FightMyMonster/UserMonsters-f-7046-Data.db&= #39;),SSTableReader(path=3D'/var/opt/cassandra/data/FightMyMonster/User= Monsters-f-7048-Data.db'),SSTableReader(path=3D'/var/opt/cassandra/= data/FightMyMonster/
>> >> >
>> >> >
>> >> > UserMonsters-f-7050-Data.db'),SSTableReader(path= =3D'/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db&= #39;)]
>> >> > =A0INFO [NonPeriodicTasks:1] 2011-06-16 22:50:02,399=
>> >> > ColumnFamilyStore.java
>> >> > (line 1065) Enqueuing flush of Memtable-Monsters@562= 380513(497561
>> >> > bytes,
>> >> > 13860 operations)
>> >> > =A0INFO [FlushWriter:1176] 2011-06-16 22:50:02,399 M= emtable.java (line
>> >> > 157)
>> >> > Writing Memtable-Monsters@562380513(497561 bytes, 13= 860 operations)
>> >> > =A0INFO [FlushWriter:1176] 2011-06-16 22:50:02,580 M= emtable.java (line
>> >> > 172)
>> >> > Completed flushing
>> >> > /var/opt/cassandra/data/FightMyMonster/Monsters-f-71= 60-Data.db
>> >> > (471100
>> >> > bytes)
>> >> > ERROR [CompactionExecutor:1] 2011-06-16 22:50:37,776=
>> >> > AbstractCassandraDaemon.java (line 114) Fatal except= ion in thread
>> >> > Thread[CompactionExecutor:1,1,main]
>> >> > java.io.IOError: java.io.IOException: Corrupt (negat= ive) value length
>> >> > encountered
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.deserial= izeNext(ColumnSortedMap.java:252)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.next(Col= umnSortedMap.java:268)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.next(Col= umnSortedMap.java:227)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > java.util.concurrent.ConcurrentSkipListMap.buildFrom= Sorted(ConcurrentSkipListMap.java:1493)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > java.util.concurrent.ConcurrentSkipListMap.<init&= gt;(ConcurrentSkipListMap.java:1443)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.SuperColumnSerializer.deseri= alize(SuperColumn.java:379)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.SuperColumnSerializer.deseri= alize(SuperColumn.java:362)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.SuperColumnSerializer.deseri= alize(SuperColumn.java:322)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.ColumnFamilySerializer.deser= ializeColumns(ColumnFamilySerializer.java:129)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.sstable.SSTableIdentityItera= tor.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> > org.apache.cassandra.io.PrecompactedRow.<init>= (PrecompactedRow.java:78)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.CompactionIterator.getCompac= tedRow(CompactionIterator.java:154)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.CompactionIterator.getReduce= d(CompactionIterator.java:110)
>> >> > =A0 =A0 =A0 =A0at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.CompactionIterator.getReduce= d(CompactionIterator.java:45)
>> >> > =A0 =A0 =A0 =A0at
>> >> >
>> >> >
>> >> > org.apache.cassandra.utils.ReducingIterator.computeN= ext(ReducingIterator.java:74)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > com.google.common.collect.AbstractIterator.tryToComp= uteNext(AbstractIterator.java:136)
>> >> > =A0 =A0 =A0 =A0at
>> >> >
>> >> >
>> >> > com.google.common.collect.AbstractIterator.hasNext(A= bstractIterator.java:131)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.commons.collections.iterators.FilterItera= tor.setNextObject(FilterIterator.java:183)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.commons.collections.iterators.FilterItera= tor.hasNext(FilterIterator.java:94)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.doCompacti= on(CompactionManager.java:448)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager$1.call(Com= pactionManager.java:124)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager$1.call(Com= pactionManager.java:94)
>> >> > =A0 =A0 =A0 =A0 at
>> >> > java.util.concurrent.FutureTask$Sync.innerRun(Future= Task.java:303)
>> >> > =A0 =A0 =A0 =A0 at java.util.concurrent.FutureTask.r= un(FutureTask.java:138)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTa= sk(ThreadPoolExecutor.java:886)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(T= hreadPoolExecutor.java:908)
>> >> > =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:= 619)
>> >> > Caused by: java.io.IOException: Corrupt (negative) v= alue length
>> >> > encountered
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.utils.ByteBufferUtil.readWithLe= ngth(ByteBufferUtil.java:315)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.ColumnSerializer.deserialize= (ColumnSerializer.java:99)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.ColumnIterator.deserial= izeNext(ColumnSortedMap.java:248)
>> >> > =A0 =A0 =A0 =A0 ... 26 more
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-16 22:50:37,7= 94
>> >> > CompactionManager.java
>> >> > (line 395) Compacting
>> >> >
>> >> >
>> >> > [SSTableReader(path=3D'/var/opt/cassandra/data/F= ightMyMonster/Monsters-f-7157-Data.db'),SSTableReader(path
>> >> >
>> >> >
>> >> > =3D'/var/opt/cassandra/data/FightMyMonster/Monst= ers-f-7158-Data.db'),SSTableReader(path=3D'/var/opt/cassandra/data/= FightMyMonster/Monsters-f-7159-Data.db'),SSTableReader(path=3D'/var= /opt/cassandra/data/FightM
>> >> > yMonster/Monsters-f-7160-Data.db')]
>> >> >
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:24,7= 56
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7554-Data.db')
>> >> > complete: 244 rows in new sstable and 0 empty (tombs= toned) rows
>> >> > dropped
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:24,7= 56
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7446-Data.db')
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:26,1= 15
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7446-Data.db')
>> >> > complete: 493 rows in new sstable and 0 empty (tombs= toned) rows
>> >> > dropped
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:26,1= 15
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7018-Data.db')
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:26,2= 25
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7018-Data.db')
>> >> > complete: 275 rows in new sstable and 0 empty (tombs= toned) rows
>> >> > dropped
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:26,2= 26
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7580-Data.db')
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:28,3= 83
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7580-Data.db')
>> >> > complete: 297 rows in new sstable and 0 empty (tombs= toned) rows
>> >> > dropped
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:28,3= 84
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7574-Data.db')
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:29,3= 00
>> >> > CompactionManager.java
>> >> > (line 652) Scrub of
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7574-Data.db')
>> >> > complete: 347 rows in new sstable and 0 empty (tombs= toned) rows
>> >> > dropped
>> >> > =A0INFO [CompactionExecutor:1] 2011-06-17 00:46:29,3= 00
>> >> > CompactionManager.java
>> >> > (line 511) Scrubbing
>> >> >
>> >> >
>> >> > SSTableReader(path=3D'/var/opt/cassandra/data/Fi= ghtMyMonster/UserMonsters-f-7010-Data.db')
>> >> > ERROR [CompactionExecutor:1] 2011-06-17 00:46:29,374=
>> >> > AbstractCassandraDaemon.java (line 114) Fatal except= ion in thread
>> >> > Thread[CompactionExecutor:1,1,main]
>> >> > java.io.FileNotFoundException:
>> >> >
>> >> > /var/opt/cassandra/data/FightMyMonster/UserMonsters-= tmp-f-7823-Data.db
>> >> > (Too
>> >> > many open files)
>> >> > =A0 =A0 =A0 =A0 at java.io.RandomAccessFile.open(Nat= ive Method)
>> >> > =A0 =A0 =A0 =A0 at java.io.RandomAccessFile.<init= >(RandomAccessFile.java:212)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.util.BufferedRandomAccessFil= e.<init>(BufferedRandomAccessFile.java:113)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.io.sstable.SSTableWriter.<in= it>(SSTableWriter.java:78)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.ColumnFamilyStore.createComp= actionWriter(ColumnFamilyStore.java:2243)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.maybeCreat= eWriter(CompactionManager.java:794)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.doScrub(Co= mpactionManager.java:534)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager.access$600= (CompactionManager.java:56)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > org.apache.cassandra.db.CompactionManager$3.call(Com= pactionManager.java:195)
>> >> > =A0 =A0 =A0 =A0 at
>> >> > java.util.concurrent.FutureTask$Sync.innerRun(Future= Task.java:303)
>> >> > =A0 =A0 =A0 =A0 at java.util.concurrent.FutureTask.r= un(FutureTask.java:138)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTa= sk(ThreadPoolExecutor.java:886)
>> >> > =A0 =A0 =A0 =A0 at
>> >> >
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(T= hreadPoolExecutor.java:908)
>> >> > =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:= 619)
>> >> > Dominic
>> >
>> >
>
>

--0015176f0e3e2fd13204a5ec6fe5--