Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 95469 invoked from network); 9 Mar 2011 02:59:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Mar 2011 02:59:24 -0000 Received: (qmail 80154 invoked by uid 500); 9 Mar 2011 02:59:22 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80110 invoked by uid 500); 9 Mar 2011 02:59:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80098 invoked by uid 99); 9 Mar 2011 02:59:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 02:59:22 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 02:59:18 +0000 Received: by vws6 with SMTP id 6so123900vws.31 for ; Tue, 08 Mar 2011 18:58:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=ELuT4QxZ610hjKO4Nhtyvz7WS7b88j6c+7IdIqoqvUQ=; b=RbjWnWgg2M+sP/YbVXdK7l5+5WsT5z7yUZ8e8vjAtQ/KYC5XHtEzOPiZXsO/9zY7GQ i/wEWAl5jSnGSSCFgcCB9/cXuzIdwyGYRI6+xvfhYIzRKYe3C4NMDzOg3LmHaBysK4XW CdCvlcTsRmipZWqd2UwV9AxFIBEnh0yVYZJXk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=seVDsS+AqGwRrQKVnyMDKo8L7xZ1DU9Q1uK/Anen1+9UqmpcX84z9QAuj/ksvzeOTU 2mleDN67bb33AfpBSHlBGKGpv9/HOQ/WXjdbIVtNbANPS54lj07T9AgttMWBqXxzX2/Z ipoM9ZsBUAxky1Xf7xlRaVV8kFF63PnAENdHA= Received: by 10.52.90.73 with SMTP id bu9mr1658437vdb.92.1299639536082; Tue, 08 Mar 2011 18:58:56 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.164.102 with HTTP; Tue, 8 Mar 2011 18:58:36 -0800 (PST) In-Reply-To: References: <4D761457.8030003@hiramoto.org> <4D7659FD.8030806@hiramoto.org> From: Jonathan Ellis Date: Tue, 8 Mar 2011 20:58:36 -0600 Message-ID: Subject: Re: 0.7.3 nodetool scrub exceptions To: user@cassandra.apache.org Cc: Karl Hiramoto Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Turn on debug logging and see if the output looks like what I posted to https://issues.apache.org/jira/browse/CASSANDRA-2296 It *may* be harmless depending on where those zero-length rows are coming from. I've added asserts to 0.7 branch that fire if we attempt to write a zero-length row, so if the bug is still present in 0.7.3+ that should catch it. On Tue, Mar 8, 2011 at 7:31 PM, Jonathan Ellis wrote: > alienth on irc is reporting the same error. =A0His path was 0.6.8 to > 0.7.1 to 0.7.3. > > It's probably a bug in scrub. =A0If we can get an sstable exhibiting the > problem posted here or on Jira that would help troubleshoot. > > On Tue, Mar 8, 2011 at 10:31 AM, Karl Hiramoto wrote: >> On 08/03/2011 17:09, Jonathan Ellis wrote: >>> >>> No. >>> >>> What is the history of your cluster? >> >> It started out as 0.7.0 - RC3 =A0 =A0 And I've upgraded 0.7.0, 0.7.1, 0.= 7.2, >> 0.7.3 =A0within a few days after each was released. >> >> I have 6 nodes about 10GB of data each RF=3D2. =A0 Only one CF every >> row/column has a TTL of 24 hours. >> I do a staggered =A0repair/compact/cleanup across every node in a cronjo= b. >> >> >> After upgrading to 0.7.3 =A0I had a lot of nodes crashing due to OOM. = =A0 =A0 I >> reduced the key cache from the default 200000 to 1000 and increased the = heap >> size from 8GB to 12GB and the OOM crashes went away. >> >> >> Anyway to fix this without throwing away all the data? >> >> Since i only keep data 24 hours, =A0I could insert into two CF for the n= ext 24 >> hours than after only valid data in new CF remove the old CF. >> >> >> >>> On Tue, Mar 8, 2011 at 5:34 AM, Karl Hiramoto =A0wro= te: >>>> >>>> I have 1000's of these in the log =A0is this normal? >>>> >>>> java.io.IOError: java.io.EOFException: bloom filter claims to be longe= r >>>> than >>>> entire row size >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTable= IdentityIterator.java:117) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.ja= va:590) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager= .java:56) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.jav= a:195) >>>> =A0 =A0 =A0 =A0at >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:= 166) >>>> =A0 =A0 =A0 =A0at >>>> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1110) >>>> =A0 =A0 =A0 =A0at >>>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:603) >>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>>> entire row size >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexH= elper.java:113) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTable= IdentityIterator.java:87) >>>> =A0 =A0 =A0 =A0... 8 more >>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>> CompactionManager.java >>>> (line 625) Row is unreadable; skipping to next >>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>> CompactionManager.java >>>> (line 599) Non-fatal error reading row (stacktrace follows) >>>> java.io.IOError: java.io.EOFException: bloom filter claims to be longe= r >>>> than >>>> entire row size >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTable= IdentityIterator.java:117) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.ja= va:590) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager= .java:56) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.jav= a:195) >>>> =A0 =A0 =A0 =A0at >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:= 166) >>>> =A0 =A0 =A0 =A0at >>>> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1110) >>>> =A0 =A0 =A0 =A0at >>>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:603) >>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>>> entire row size >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexH= elper.java:113) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTable= IdentityIterator.java:87) >>>> =A0 =A0 =A0 =A0... 8 more >>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>> CompactionManager.java >>>> (line 625) Row is unreadable; skipping to next >>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>> CompactionManager.java >>>> (line 599) Non-fatal error reading row (stacktrace follows) >>>> java.io.IOError: java.io.EOFException: bloom filter claims to be longe= r >>>> than >>>> entire row size >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTable= IdentityIterator.java:117) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.ja= va:590) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager= .java:56) >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.jav= a:195) >>>> =A0 =A0 =A0 =A0at >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:= 166) >>>> =A0 =A0 =A0 =A0at >>>> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1110) >>>> =A0 =A0 =A0 =A0at >>>> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:603) >>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>>> entire row size >>>> =A0 =A0 =A0 =A0at >>>> >>>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexH= elper.java:113) >>>> =A0 =A0 =A0 =A0at org.apa >>>> >>> >>> >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com