From user-return-16393-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue May 3 18:22:14 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7AD682FB7 for ; Tue, 3 May 2011 18:22:14 +0000 (UTC) Received: (qmail 17446 invoked by uid 500); 3 May 2011 18:22:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 17406 invoked by uid 500); 3 May 2011 18:22:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 17396 invoked by uid 99); 3 May 2011 18:22:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2011 18:22:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tmarthinussen@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2011 18:22:07 +0000 Received: by qwc23 with SMTP id 23so255020qwc.31 for ; Tue, 03 May 2011 11:21:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=j6jPXwTTpnOE0M9zlq9DjDElALT4zaFltBSbXV2AwsM=; b=TajX3+tBh6qfQWmYuR/b/Fu0UkJZSCdz56NImJxaWGr5GhC9q2L5U3RR2hIrm1N/GX aBnK0pBha4vsvlEf2sOqnFpbmYRjX8o2OyCJQdle7RjGC4TF8v7dl2wK9jWj9E0gqqsR vPzSluIjREPzSF1c1TG6SUj7CxCAn+RnTK1Vc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=WzGKkQEk82GdYqgKCGgwuGmk+oCfR7KKpStYD3MJQUcMCeXJc7ZHh47lh3d4xV3SeA ER9PF+qfenuQeed5kzf+I0+bWmd5bSrfVGxRdDVigcxXLXhBYvigVTfQy/IOmy2oCE+L RBToiD5MAI5C/Nc0IpbM2nOJgOKIuL7Ww3JZ8= MIME-Version: 1.0 Received: by 10.224.213.193 with SMTP id gx1mr204664qab.44.1304446906572; Tue, 03 May 2011 11:21:46 -0700 (PDT) Received: by 10.229.88.19 with HTTP; Tue, 3 May 2011 11:21:46 -0700 (PDT) In-Reply-To: References: Date: Wed, 4 May 2011 03:21:46 +0900 Message-ID: Subject: Re: MemtablePostFlusher with high number of pending calls? From: Terje Marthinussen To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3005dccc7ea15b04a26336fe --20cf3005dccc7ea15b04a26336fe Content-Type: text/plain; charset=ISO-8859-1 So yes, there is currently some 200GB empty disk. On Wed, May 4, 2011 at 3:20 AM, Terje Marthinussen wrote: > Just some very tiny amount of writes in the background here (some hints > spooled up on another node slowly coming in). > No new data. > > I thought there was no exceptions, but I did not look far enough back in > the log at first. > > Going back a bit further now however, I see that about 50 hours ago: > ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027 > AbstractCassandraDaemon.java (line 112) Fatal exception in thread > Thread[CompactionExecutor:387,1,main] > java.io.IOException: No space left on device > at java.io.RandomAccessFile.writeBytes(Native Method) > at java.io.RandomAccessFile.write(RandomAccessFile.java:466) > at > org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:160) > at > org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:225) > at > org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:356) > at > org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:335) > at > org.apache.cassandra.io.PrecompactedRow.write(PrecompactedRow.java:102) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:130) > at > org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:566) > at > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:146) > at > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:112) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > [followed by a few more of those...] > > and then a bunch of these: > ERROR [FlushWriter:123] 2011-05-02 01:21:12,690 > AbstractCassandraDaemon.java (line 112) Fatal exception in thread > Thread[FlushWriter:123,5,main] > java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk > space to flush 40009184 bytes > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: Insufficient disk space to flush > 40009184 bytes > at > org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597) > at > org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100) > at > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239) > at org.apache.cassandra.db.Memtable.access$400(Memtable.java:50) > at > org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > > Seems like compactions stopped after this (a bunch of tmp tables there > still from when those errors where generated), and I can only suspect the > post flusher may have stopped at the same time. > > There is 890GB of disk for data, sstables are currently using 604G (139GB > is old tmp tables from when it ran out of disk) and "ring" tells me the load > on the node is 313GB. > > Terje > > > > On Wed, May 4, 2011 at 3:02 AM, Jonathan Ellis wrote: > >> ... and are there any exceptions in the log? >> >> On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis wrote: >> > Does it resolve down to 0 eventually if you stop doing writes? >> > >> > On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen >> > wrote: >> >> Cassandra 0.8 beta trunk from about 1 week ago: >> >> Pool Name Active Pending Completed >> >> ReadStage 0 0 5 >> >> RequestResponseStage 0 0 87129 >> >> MutationStage 0 0 187298 >> >> ReadRepairStage 0 0 0 >> >> ReplicateOnWriteStage 0 0 0 >> >> GossipStage 0 0 1353524 >> >> AntiEntropyStage 0 0 0 >> >> MigrationStage 0 0 10 >> >> MemtablePostFlusher 1 190 108 >> >> StreamStage 0 0 0 >> >> FlushWriter 0 0 302 >> >> FILEUTILS-DELETE-POOL 0 0 26 >> >> MiscStage 0 0 0 >> >> FlushSorter 0 0 0 >> >> InternalResponseStage 0 0 0 >> >> HintedHandoff 1 4 7 >> >> >> >> Anyone with nice theories about the pending value on the memtable post >> >> flusher? >> >> Regards, >> >> Terje >> > >> > >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of DataStax, the source for professional Cassandra support >> > http://www.datastax.com >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > --20cf3005dccc7ea15b04a26336fe Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable So yes, there is currently some 200GB empty disk.

On Wed, May 4, 2011 at 3:20 AM, Terje Marthinussen <tmarthinussen@gmail.com> wrote:
Just some very tiny amount of writes in the= background here (some hints spooled up on another node slowly coming in).<= div> No new data.

I thought there was no exceptions, but= =A0I did not look far enough back in the log at first.

Going back a bit further now however, I see that about = 50 hours ago:
ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,0= 27 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread= [CompactionExecutor:387,1,main]
java.io.IOException: No space left on device
=A0=A0 =A0 =A0 = =A0at java.io.RandomAccessFile.writeBytes(Native Method)
=A0=A0 = =A0 =A0 =A0at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.io.util.BufferedRandomAcce= ssFile.flush(BufferedRandomAccessFile.java:160)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.io.util.BufferedRandomAccess= File.reBuffer(BufferedRandomAccessFile.java:225)
=A0=A0 =A0 =A0 = =A0at org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(Buf= feredRandomAccessFile.java:356)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.io.util.BufferedRandomAccess= File.write(BufferedRandomAccessFile.java:335)
=A0=A0 =A0 =A0 =A0a= t org.apache.cassandra.io.PrecompactedRow.write(PrecompactedRow.java:102)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.io.sstable.SSTableWriter= .append(SSTableWriter.java:130)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.db.CompactionManager.doCompa= ction(CompactionManager.java:566)
=A0=A0 =A0 =A0 =A0at org.apache= .cassandra.db.CompactionManager$1.call(CompactionManager.java:146)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.db.CompactionManager$1.call(Com= pactionManager.java:112)
=A0=A0 =A0 =A0 =A0at java.util.concurrent.FutureTask$Sync.innerRun(Fut= ureTask.java:303)
=A0=A0 =A0 =A0 =A0at java.util.concurrent.Futur= eTask.run(FutureTask.java:138)
=A0=A0 =A0 =A0 =A0at java.util.con= current.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
=A0=A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.ru= n(ThreadPoolExecutor.java:908)
=A0=A0 =A0 =A0 =A0at java.lang.Thr= ead.run(Thread.java:662)
[followed by a few more of those...]

and then a bunch of these:
ERROR [FlushWriter:123] 2011= -05-02 01:21:12,690 AbstractCassandraDaemon.java (line 112) Fatal exception= in thread Thread[FlushWriter:123,5,main]
java.lang.RuntimeExcept= ion: java.lang.RuntimeException: Insufficient disk space to flush 40009184 = bytes
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.utils.WrappedRunnable.run(Wr= appedRunnable.java:34)
=A0=A0 =A0 =A0 =A0at java.util.concurrent.= ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
= =A0=A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.run(Thr= eadPoolExecutor.java:908)
=A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662)
C= aused by: java.lang.RuntimeException: Insufficient disk space to flush 4000= 9184 bytes
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.db.ColumnFam= ilyStore.getFlushPath(ColumnFamilyStore.java:597)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.db.ColumnFamilyStore.createF= lushWriter(ColumnFamilyStore.java:2100)
=A0=A0 =A0 =A0 =A0at org.= apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.db.Memtable.access$400(Memtabl= e.java:50)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.db.Memtable$3.runMayThrow(Me= mtable.java:263)
=A0=A0 =A0 =A0 =A0at org.apache.cassandra.utils.= WrappedRunnable.run(WrappedRunnable.java:30)
=A0=A0 =A0 =A0 =A0..= . 3 more

Seems like compactions stopped after this (a bunch of tmp tables= there still from when those errors where generated), and I can only suspec= t the post flusher may have stopped at the same time.

There is 890GB of disk for data, sstables are currently using 604G (13= 9GB is old tmp tables from when it ran out of disk) and "ring" te= lls me the load on the node is 313GB.

Terje



On W= ed, May 4, 2011 at 3:02 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
... and are there any exceptions in the log?

On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis <
jbellis@gmail.com> wrote:
> Does it resolve down to 0 eventually if you stop doing writes?
>
> On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen
> <tmart= hinussen@gmail.com> wrote:
>> Cassandra 0.8 beta trunk from about 1 week ago:
>> Pool Name =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Active =A0 Pendin= g =A0 =A0 =A0Completed
>> ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 = =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =A05
>> RequestResponseStage =A0 =A0 =A0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 = 0 =A0 =A0 =A0 =A0 =A087129
>> MutationStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 = =A0 =A0 0 =A0 =A0 =A0 =A0 187298
>> ReadRepairStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 = =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =A00
>> ReplicateOnWriteStage =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 0 = =A0 =A0 =A0 =A0 =A0 =A0 =A00
>> GossipStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 = =A0 =A0 0 =A0 =A0 =A0 =A01353524
>> AntiEntropyStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 = =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =A00
>> MigrationStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00 =A0 =A0 = =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 10
>> MemtablePostFlusher =A0 =A0 =A0 =A0 =A0 =A0 =A0 1 =A0 =A0 =A0 190 = =A0 =A0 =A0 =A0 =A0 =A0108
>> StreamStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 = =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =A00
>> FlushWriter =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 = =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0302
>> FILEUTILS-DELETE-POOL =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 0 = =A0 =A0 =A0 =A0 =A0 =A0 26
>> MiscStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 = =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =A00
>> FlushSorter =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 = =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =A00
>> InternalResponseStage =A0 =A0 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 0 = =A0 =A0 =A0 =A0 =A0 =A0 =A00
>> HintedHandoff =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1 =A0 =A0 = =A0 =A0 4 =A0 =A0 =A0 =A0 =A0 =A0 =A07
>>
>> Anyone with nice theories about the pending value on the memtable = post
>> flusher?
>> Regards,
>> Terje
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support<= br> > http://www.datas= tax.com
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.c= om


--20cf3005dccc7ea15b04a26336fe--