Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A16849AF9 for ; Thu, 10 Nov 2011 20:23:36 +0000 (UTC) Received: (qmail 19940 invoked by uid 500); 10 Nov 2011 20:23:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 19817 invoked by uid 500); 10 Nov 2011 20:23:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 19809 invoked by uid 99); 10 Nov 2011 20:23:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Nov 2011 20:23:34 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.213.171 as permitted sender) Received: from [209.85.213.171] (HELO mail-yx0-f171.google.com) (209.85.213.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Nov 2011 20:23:27 +0000 Received: by yenl6 with SMTP id l6so3287949yen.30 for ; Thu, 10 Nov 2011 12:23:05 -0800 (PST) Received: by 10.101.179.34 with SMTP id g34mr2852624anp.18.1320956585202; Thu, 10 Nov 2011 12:23:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.236.103.13 with HTTP; Thu, 10 Nov 2011 12:22:44 -0800 (PST) In-Reply-To: <4ebc2e5a.a5b4ec0a.3445.0f3d@mx.google.com> References: <4ebc2e5a.a5b4ec0a.3445.0f3d@mx.google.com> From: Sylvain Lebresne Date: Thu, 10 Nov 2011 21:22:44 +0100 Message-ID: Subject: Re: 1.0.2 Assertion Error To: user@cassandra.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable That would be a bug (as any assertion error would be), likely some race condition. Could you open a ticket? The fact that this block the MemtablePostFlusher is unfortunately related. Restarting the node would fix but we need to make that more solid too. -- Sylvain On Thu, Nov 10, 2011 at 9:04 PM, Dan Hendry wro= te: > Just happened again, seems to be with the same column family (at least on= a > flusher thread for which the last activity was flushing a memtable for th= at > CF). > > > > It also looks like MemtablePostFlusher tasks blocked (and not getting > cleared) as evidenced by tpstats: > > > > Pool Name=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Active= =A0=A0 Pending=A0=A0=A0=A0=A0 Completed=A0=A0 Blocked=A0 All > time blocked > > MemtablePostFlusher=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1=A0=A0=A0= =A0=A0=A0=A0 18=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =A0=A016 > 0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0 > > > > Dan > > > > From: Dan Hendry [mailto:dan.hendry.junk@gmail.com] > Sent: November-10-11 14:49 > To: 'user@cassandra.apache.org' > Subject: 1.0.2 Assertion Error > > > > Just saw this weird assertion after upgrading one of my nodes from 0.8.6 = to > 1.0.2 (its been running fine for a few hours now): > > > > INFO [FlushWriter:9] 2011-11-10 13:08:58,882 Memtable.java (line 237) > Writing Memtable-Data@1388955390(25676955/430716097 serialized/live bytes= , > 478913 ops) > > ERROR [FlushWriter:9] 2011-11-10 13:08:59,513 AbstractCassandraDaemon.jav= a > (line 133) Fatal exception in thread Thread[FlushWriter:9,5,main] > > java.lang.AssertionError: CF size changed during serialization: was 4 > initially but 3 written > > =A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(Column= FamilySerializer.java:94) > > =A0=A0=A0=A0 =A0=A0=A0at > org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(Colum= nFamilySerializer.java:112) > > =A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:1= 77) > > =A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:264) > > =A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.Memtable.access$400(Memt= able.java:47) > > =A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.Memtable$4.runMayThrow(M= emtable.java:289) > > =A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > > =A0=A0=A0=A0=A0=A0=A0at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .java:886) > > =A0=A0=A0=A0=A0=A0=A0 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:908) > > =A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Thread.java:662) > > > > > > After taking a quick peek at the code, it looks like the numbers (=934 > initially but 3 written=94) refer to the number of columns (in the memtab= le?). > Given the byte size of the memtable being flushed, there are *certainly* > more than 4 columns. Besides this error, there are no other unexpected or > unusual log entries and the node seems to be behaving normally. > > > > Thoughts? > > > > Dan Hendry