Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 20401 invoked from network); 8 Mar 2011 20:46:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Mar 2011 20:46:13 -0000 Received: (qmail 61583 invoked by uid 500); 8 Mar 2011 20:46:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61559 invoked by uid 500); 8 Mar 2011 20:46:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61551 invoked by uid 99); 8 Mar 2011 20:46:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 20:46:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 20:46:05 +0000 Received: by gyc15 with SMTP id 15so2806562gyc.31 for ; Tue, 08 Mar 2011 12:45:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.150.163.20 with SMTP id l20mr6625248ybe.249.1299617143832; Tue, 08 Mar 2011 12:45:43 -0800 (PST) Received: by 10.147.34.2 with HTTP; Tue, 8 Mar 2011 12:45:43 -0800 (PST) X-Originating-IP: [88.183.33.171] In-Reply-To: <4D7659FD.8030806@hiramoto.org> References: <4D761457.8030003@hiramoto.org> <4D7659FD.8030806@hiramoto.org> Date: Tue, 8 Mar 2011 21:45:43 +0100 Message-ID: Subject: Re: 0.7.3 nodetool scrub exceptions From: Sylvain Lebresne To: user@cassandra.apache.org Cc: Karl Hiramoto Content-Type: multipart/alternative; boundary=000e0cd5734a33c354049dfeb2a4 --000e0cd5734a33c354049dfeb2a4 Content-Type: text/plain; charset=ISO-8859-1 Did you run scrub as soon as you updated to 0.7.3 ? And did you had problems/exceptions before running scrub ? If yes, did you had problems with only 0.7.3 or also with 0.7.2 ? If the problems started with running scrub, since it takes a snapshot before running, can you try restarting a test cluster with this snapshot and see if a simple compaction work for instance. -- Sylvain On Tue, Mar 8, 2011 at 5:31 PM, Karl Hiramoto wrote: > On 08/03/2011 17:09, Jonathan Ellis wrote: > >> No. >> >> What is the history of your cluster? >> > It started out as 0.7.0 - RC3 And I've upgraded 0.7.0, 0.7.1, 0.7.2, > 0.7.3 within a few days after each was released. > > I have 6 nodes about 10GB of data each RF=2. Only one CF every > row/column has a TTL of 24 hours. > I do a staggered repair/compact/cleanup across every node in a cronjob. > > > After upgrading to 0.7.3 I had a lot of nodes crashing due to OOM. I > reduced the key cache from the default 200000 to 1000 and increased the heap > size from 8GB to 12GB and the OOM crashes went away. > > > Anyway to fix this without throwing away all the data? > > Since i only keep data 24 hours, I could insert into two CF for the next > 24 hours than after only valid data in new CF remove the old CF. > > > > > On Tue, Mar 8, 2011 at 5:34 AM, Karl Hiramoto wrote: >> >>> I have 1000's of these in the log is this normal? >>> >>> java.io.IOError: java.io.EOFException: bloom filter claims to be longer >>> than >>> entire row size >>> at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:117) >>> at >>> >>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:590) >>> at >>> >>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56) >>> at >>> >>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195) >>> at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:636) >>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>> entire row size >>> at >>> >>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:113) >>> at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:87) >>> ... 8 more >>> WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 625) Row is unreadable; skipping to next >>> WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 599) Non-fatal error reading row (stacktrace follows) >>> java.io.IOError: java.io.EOFException: bloom filter claims to be longer >>> than >>> entire row size >>> at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:117) >>> at >>> >>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:590) >>> at >>> >>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56) >>> at >>> >>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195) >>> at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:636) >>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>> entire row size >>> at >>> >>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:113) >>> at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:87) >>> ... 8 more >>> WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 625) Row is unreadable; skipping to next >>> WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 >>> CompactionManager.java >>> (line 599) Non-fatal error reading row (stacktrace follows) >>> java.io.IOError: java.io.EOFException: bloom filter claims to be longer >>> than >>> entire row size >>> at >>> >>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:117) >>> at >>> >>> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:590) >>> at >>> >>> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56) >>> at >>> >>> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195) >>> at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:636) >>> Caused by: java.io.EOFException: bloom filter claims to be longer than >>> entire row size >>> at >>> >>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:113) >>> at org.apa >>> >>> >> >> > --000e0cd5734a33c354049dfeb2a4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Did you run scrub as soon as you updated to 0.7.3 ?

And = did you had problems/exceptions before running scrub ?
If ye= s, did you had problems with only 0.7.3 or also with 0.7.2 ?

If the problems started with running scrub, since it takes a sna= pshot
before running, can you try restarting a test cluster with = this snapshot
and see if a simple compaction work for instance.

--
Sylvain


=
On Tue, Mar 8, 2011 at 5:31 PM, Karl Hiramo= to <karl@hiramoto= .org> wrote:
On 08/03/2011 17:09, Jona= than Ellis wrote:
No.

What is the history of your cluster?
It started out as 0.7.0 - RC3 =A0 =A0 And I've upgraded 0.7.0, 0.7.1, 0= .7.2, 0.7.3 =A0within a few days after each was released.

I have 6 nodes about 10GB of data each RF=3D2. =A0 Only one CF every =A0 ro= w/column has a TTL of 24 hours.
I do a staggered =A0repair/compact/cleanup across every node in a cronjob.<= br>

After upgrading to 0.7.3 =A0I had a lot of nodes crashing due to OOM. =A0 = =A0 I reduced the key cache from the default 200000 to 1000 and increased t= he heap size from 8GB to 12GB and the OOM crashes went away.


Anyway to fix this without throwing away all the data?

Since i only keep data 24 hours, =A0I could insert into two CF for the next= 24 hours than after only valid data in new CF remove the old CF.
=




On Tue, Mar 8, 2011 at 5:34 AM, Karl Hiramoto<karl@hiramoto.org> =A0wrote:
I have 1000's of these in the log =A0is this normal?

java.io.IOError: java.io.EOFException: bloom filter claims to be longer tha= n
entire row size
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTabl= eIdentityIterator.java:117)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:59= 0)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java= :56)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195= )
=A0 =A0 =A0 =A0at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask= .java:334)
=A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:166)=
=A0 =A0 =A0 =A0at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 110)
=A0 =A0 =A0 =A0at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 603)
=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException: bloom filter claims to be longer than
entire row size
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper= .java:113)
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTabl= eIdentityIterator.java:87)
=A0 =A0 =A0 =A0... 8 more
=A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 CompactionManager.j= ava
(line 625) Row is unreadable; skipping to next
=A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 CompactionManager.j= ava
(line 599) Non-fatal error reading row (stacktrace follows)
java.io.IOError: java.io.EOFException: bloom filter claims to be longer tha= n
entire row size
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTabl= eIdentityIterator.java:117)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:59= 0)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java= :56)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195= )
=A0 =A0 =A0 =A0at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask= .java:334)
=A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:166)=
=A0 =A0 =A0 =A0at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 110)
=A0 =A0 =A0 =A0at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 603)
=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException: bloom filter claims to be longer than
entire row size
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper= .java:113)
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTabl= eIdentityIterator.java:87)
=A0 =A0 =A0 =A0... 8 more
=A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 CompactionManager.j= ava
(line 625) Row is unreadable; skipping to next
=A0WARN [CompactionExecutor:1] 2011-03-08 11:32:35,615 CompactionManager.j= ava
(line 599) Non-fatal error reading row (stacktrace follows)
java.io.IOError: java.io.EOFException: bloom filter claims to be longer tha= n
entire row size
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTabl= eIdentityIterator.java:117)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:59= 0)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java= :56)
=A0 =A0 =A0 =A0at
org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195= )
=A0 =A0 =A0 =A0at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask= .java:334)
=A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:166)=
=A0 =A0 =A0 =A0at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 110)
=A0 =A0 =A0 =A0at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 603)
=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException: bloom filter claims to be longer than
entire row size
=A0 =A0 =A0 =A0at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper= .java:113)
=A0 =A0 =A0 =A0at org.apa





--000e0cd5734a33c354049dfeb2a4--