Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 34395 invoked from network); 9 Mar 2011 01:33:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Mar 2011 01:33:50 -0000 Received: (qmail 13438 invoked by uid 500); 9 Mar 2011 01:33:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13418 invoked by uid 500); 9 Mar 2011 01:33:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13409 invoked by uid 99); 9 Mar 2011 01:33:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 01:33:48 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 01:33:43 +0000 Received: by vxg33 with SMTP id 33so62035vxg.31 for ; Tue, 08 Mar 2011 17:33:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=2cDxq8098cbz+0AyoB84LQr6jo+VV01l+z4cQq9yga0=; b=R+imaeMVOIXWV3zpUnOfaud9xUvuIkQlRWtOnu44BWMKH+7912qkGoazDLWq1CkeEZ HRYjk7VAczxoziaX0URYKNuJwDO9fT2/RRIF7XDgvEKrh0PF+FpOR6n64Ncj+b4fvVL0 mfKzXO92EgcPgXFvQndhJz2S2hXJ0yLYB1u2M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=DaDRdZGaT6ytPMlTlXN+RM3sA22C64YzCKTpJhlW5uM1qFy8X0qHeHWwJTBkqW91eg J48xPG9uwMvJPck4sH7HTtRgvQ7CsqqrL4clkSwPkfxnxbbhP+sOo7l6iGE/5zX5b9vO HDaYqiBFcu5BxSeIlnZ7sViSJSNrdR1x240/0= Received: by 10.52.95.81 with SMTP id di17mr8367944vdb.252.1299634401183; Tue, 08 Mar 2011 17:33:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.164.102 with HTTP; Tue, 8 Mar 2011 17:31:01 -0800 (PST) In-Reply-To: <4D7659FD.8030806@hiramoto.org> References: <4D761457.8030003@hiramoto.org> <4D7659FD.8030806@hiramoto.org> From: Jonathan Ellis Date: Tue, 8 Mar 2011 19:31:01 -0600 Message-ID: Subject: Re: 0.7.3 nodetool scrub exceptions To: user@cassandra.apache.org Cc: Karl Hiramoto Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable alienth on irc is reporting the same error. His path was 0.6.8 to 0.7.1 to 0.7.3. It's probably a bug in scrub. If we can get an sstable exhibiting the problem posted here or on Jira that would help troubleshoot. On Tue, Mar 8, 2011 at 10:31 AM, Karl Hiramoto wrote: > On 08/03/2011 17:09, Jonathan Ellis wrote: >> >> No. >> >> What is the history of your cluster? > > It started out as 0.7.0 - RC3 =A0 =A0 And I've upgraded 0.7.0, 0.7.1, 0.7= .2, > 0.7.3 =A0within a few days after each was released. > > I have 6 nodes about 10GB of data each RF=3D2. =A0 Only one CF every > row/column has a TTL of 24 hours. > I do a staggered =A0repair/compact/cleanup across every node in a cronjob= . > > > After upgrading to 0.7.3 =A0I had a lot of nodes crashing due to OOM. =A0= =A0 I > reduced the key cache from the default 200000 to 1000 and increased the h= eap > size from 8GB to 12GB and the OOM crashes went away. > > > Anyway to fix this without throwing away all the data? > > Since i only keep data 24 hours, =A0I could insert into two CF for the ne= xt 24 > hours than after only valid data in new CF remove the old CF. > > > >> On Tue, Mar 8, 2011 at 5:34 AM, Karl Hiramoto =A0wrot= e: >>> >>> I have 1000's of these in the log =A0is this normal? >>> >>> java.io.IOError: java.io.EOFException: bloom filter claims to be longer >>> than >>> entire row size >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableI= dentityIterator.java:117) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.jav= a:590) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.= java:56) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java= :195) >>> =A0 =A0 =A0 =A0at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:1= 66) >>> =A0 =A0 =A0 =A0at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja= va:1110) >>> =A0 =A0 =A0 =A0at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j= ava:603) >>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>> entire row size >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHe= lper.java:113) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableI= dentityIterator.java:87) >>> =A0 =A0 =A0 =A0... 8 more >>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 625) Row is unreadable; skipping to next >>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 599) Non-fatal error reading row (stacktrace follows) >>> java.io.IOError: java.io.EOFException: bloom filter claims to be longer >>> than >>> entire row size >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableI= dentityIterator.java:117) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.jav= a:590) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.= java:56) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java= :195) >>> =A0 =A0 =A0 =A0at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:1= 66) >>> =A0 =A0 =A0 =A0at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja= va:1110) >>> =A0 =A0 =A0 =A0at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j= ava:603) >>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>> entire row size >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHe= lper.java:113) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableI= dentityIterator.java:87) >>> =A0 =A0 =A0 =A0... 8 more >>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 625) Row is unreadable; skipping to next >>> =A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 599) Non-fatal error reading row (stacktrace follows) >>> java.io.IOError: java.io.EOFException: bloom filter claims to be longer >>> than >>> entire row size >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableI= dentityIterator.java:117) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.jav= a:590) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.= java:56) >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java= :195) >>> =A0 =A0 =A0 =A0at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:1= 66) >>> =A0 =A0 =A0 =A0at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja= va:1110) >>> =A0 =A0 =A0 =A0at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j= ava:603) >>> =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636) >>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>> entire row size >>> =A0 =A0 =A0 =A0at >>> >>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHe= lper.java:113) >>> =A0 =A0 =A0 =A0at org.apa >>> >> >> > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com