Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 39921 invoked from network); 9 Mar 2011 03:36:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Mar 2011 03:36:03 -0000 Received: (qmail 16776 invoked by uid 500); 9 Mar 2011 03:36:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16752 invoked by uid 500); 9 Mar 2011 03:36:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16742 invoked by uid 99); 9 Mar 2011 03:36:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 03:36:00 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 03:35:56 +0000 Received: by vxg33 with SMTP id 33so149340vxg.31 for ; Tue, 08 Mar 2011 19:35:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=CaFzd/2po+zLUznBCikmQQdqRZGT0NrQO/tkiZAlqmw=; b=WNDNDmpg2OKB2r4MXFGZw3/jCgJY4RNIaabHW7/Sf3R5HD0pQsoNCQcGjV/cAuWGVU KSX/WsOUT0tC3r6BaZ0i4pbc73KPJpTT8CEaaZqO3MvdnTdH88F/vhM9wqzU5DJR2Mrx KVr9eVhP9++1vsF/T89y75n6Ooeaa8bsEPvbM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=v69KAfh5vDnMesgYwg/56eNLh1vbfJ90Og4OBVwo+qkCNEmUfh4jNn+z1PfR6GyfpR u2HkA7FsopCbXSCLFmh82vtXGLhT8kZ99XfO/Xe8FIMsYUutw8pf2y/R8E6mzPjrNrMO d2yHH8KSZiQAxXfAxEq+G1Jwt/n3dyviiIr7Y= Received: by 10.52.155.33 with SMTP id vt1mr6337534vdb.52.1299641735093; Tue, 08 Mar 2011 19:35:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.164.102 with HTTP; Tue, 8 Mar 2011 19:35:15 -0800 (PST) In-Reply-To: References: <4D761457.8030003@hiramoto.org> <4D7659FD.8030806@hiramoto.org> From: Jonathan Ellis Date: Tue, 8 Mar 2011 21:35:15 -0600 Message-ID: Subject: Re: 0.7.3 nodetool scrub exceptions To: user@cassandra.apache.org Cc: Karl Hiramoto Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Looks like it is harmless -- Scrub would write a zero-length row when tombstones expire and there is nothing left, instead of writing no row at all. Fix attached to the jira ticket. On Tue, Mar 8, 2011 at 8:58 PM, Jonathan Ellis wrote: > It *may* be harmless depending on where those zero-length rows are > coming from. =A0I've added asserts to 0.7 branch that fire if we attempt > to write a zero-length row, so if the bug is still present in 0.7.3+ > that should catch it. > > On Tue, Mar 8, 2011 at 7:31 PM, Jonathan Ellis wrote: >> alienth on irc is reporting the same error. =A0His path was 0.6.8 to >> 0.7.1 to 0.7.3. >> >> It's probably a bug in scrub. =A0If we can get an sstable exhibiting the >> problem posted here or on Jira that would help troubleshoot. >> >> On Tue, Mar 8, 2011 at 10:31 AM, Karl Hiramoto wrote= : >>> On 08/03/2011 17:09, Jonathan Ellis wrote: >>>> >>>> No. >>>> >>>> What is the history of your cluster? >>> >>> It started out as 0.7.0 - RC3 =A0 =A0 And I've upgraded 0.7.0, 0.7.1, 0= .7.2, >>> 0.7.3 =A0within a few days after each was released. >>> >>> I have 6 nodes about 10GB of data each RF=3D2. =A0 Only one CF every >>> row/column has a TTL of 24 hours. >>> I do a staggered =A0repair/compact/cleanup across every node in a cronj= ob. >>> >>> >>> After upgrading to 0.7.3 =A0I had a lot of nodes crashing due to OOM. = =A0 =A0 I >>> reduced the key cache from the default 200000 to 1000 and increased the= heap >>> size from 8GB to 12GB and the OOM crashes went away. >>> >>> >>> Anyway to fix this without throwing away all the data? >>> >>> Since i only keep data 24 hours, =A0I could insert into two CF for the = next 24 >>> hours than after only valid data in new CF remove the old CF. >>> >>> >>> >>>> On Tue, Mar 8, 2011 at 5:34 AM, Karl Hiramoto =A0wr= ote: >>>>> >>>>> I have 1000's of these in the log =A0is this normal? >>>>> >>>>> java.io.IOError: java.io.EOFException: bloom filter claims to be long= er >>>>> than >>>>> entire row size >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTabl= eIdentityIterator.java:117) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.j= ava:590) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManage= r.java:56) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.ja= va:195) >>>>> =A0 =A0 =A0 =A0at >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java= :166) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1110) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:603) >>>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>>>> Caused by: java.io.EOFException: bloom filter claims to be longer tha= n >>>>> entire row size >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(Index= Helper.java:113) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTabl= eIdentityIterator.java:87) >>>>> =A0 =A0 =A0 =A0... 8 more >>>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>>> CompactionManager.java >>>>> (line 625) Row is unreadable; skipping to next >>>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>>> CompactionManager.java >>>>> (line 599) Non-fatal error reading row (stacktrace follows) >>>>> java.io.IOError: java.io.EOFException: bloom filter claims to be long= er >>>>> than >>>>> entire row size >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTabl= eIdentityIterator.java:117) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.j= ava:590) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManage= r.java:56) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.ja= va:195) >>>>> =A0 =A0 =A0 =A0at >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java= :166) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1110) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:603) >>>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>>>> Caused by: java.io.EOFException: bloom filter claims to be longer tha= n >>>>> entire row size >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(Index= Helper.java:113) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTabl= eIdentityIterator.java:87) >>>>> =A0 =A0 =A0 =A0... 8 more >>>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>>> CompactionManager.java >>>>> (line 625) Row is unreadable; skipping to next >>>>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>>>> CompactionManager.java >>>>> (line 599) Non-fatal error reading row (stacktrace follows) >>>>> java.io.IOError: java.io.EOFException: bloom filter claims to be long= er >>>>> than >>>>> entire row size >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTabl= eIdentityIterator.java:117) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.j= ava:590) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManage= r.java:56) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.ja= va:195) >>>>> =A0 =A0 =A0 =A0at >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java= :166) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1110) >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:603) >>>>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>>>> Caused by: java.io.EOFException: bloom filter claims to be longer tha= n >>>>> entire row size >>>>> =A0 =A0 =A0 =A0at >>>>> >>>>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(Index= Helper.java:113) >>>>> =A0 =A0 =A0 =A0at org.apa >>>>> >>>> >>>> >>> >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com