Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 95680 invoked from network); 6 Dec 2010 16:22:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Dec 2010 16:22:44 -0000 Received: (qmail 90066 invoked by uid 500); 6 Dec 2010 16:22:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 89896 invoked by uid 500); 6 Dec 2010 16:22:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 89887 invoked by uid 99); 6 Dec 2010 16:22:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Dec 2010 16:22:38 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dan.hendry.junk@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Dec 2010 16:22:33 +0000 Received: by iwn40 with SMTP id 40so15615208iwn.31 for ; Mon, 06 Dec 2010 08:22:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:references :in-reply-to:subject:date:message-id:mime-version:content-type :content-transfer-encoding:x-mailer:thread-index:content-language; bh=ggfliaqkGe5vuPQb7+rTLqTLv1gewANeUBmmQCK1W3A=; b=SYl/U21u2PXR3BXNJCmx4GoU9nFCbbaxEABtlAlt7dLxWTdkPJztvvvdaQ+b5qNG6l naB7VsModbC1+ZzuQQFi30dytXXA/qMoXAWVIrSfSQ8Djt8XA9taf5Ih9yrEgRaiLYOh gI0gahJ6NY/aq0y66FLLZK2uK+A1qIdal5/30= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; b=LbbnPCYyiXXc164g/GFRTBarw/fMhE3Rrg0r1jsAlv5O2vFYhKmiMX8wRFvWW/FBtC 1Vu0n3bvf3a7c+XvsLx2cL19BwO8lpLABIcfLVLpaOTL7hFQyTsynh+JxkWa+F6953qH 2t3+lHwjdUFUAd01Qcu48SuVN6iOq3PwboVg8= Received: by 10.231.31.136 with SMTP id y8mr5996742ibc.145.1291652532538; Mon, 06 Dec 2010 08:22:12 -0800 (PST) Received: from DHTABLET (kik.client.acceleratorcentre.net [38.121.79.182]) by mx.google.com with ESMTPS id 34sm5150757ibi.20.2010.12.06.08.22.10 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 06 Dec 2010 08:22:10 -0800 (PST) From: "Dan Hendry" To: References: In-Reply-To: Subject: RE: Various exceptions on 0.7 Date: Mon, 6 Dec 2010 11:22:07 -0500 Message-ID: <4cfd0db2.6202e70a.34b3.ffffcbb7@mx.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 thread-index: AcuUMl9mHH06oVWLSOewEFXzv13ESQBJ6gGw Content-Language: en-ca > bleeding edge code you are running (did you try rc1?) or you do have = nodes on different versions All nodes are running code from https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7 which = I thought was essentially RC1 with fixes but I will give the actual = release a try. > you have a hardware problem Hard to say. I don=92t think so, everything else seems to be working = fine. I will try and run some diagnostics on the two nodes which seem to be = acting up. Now for some new developments; the plot thickens. I am fairly sure there = is a corrupt ColumnFamily/SSTable. After a restart, two adjacent nodes both show the following error. After which the CompactionManager pending = tasks never returns to zero. I am fairly sure this cf is not getting compacted = but compaction for other column families seems to continue. In order to get = rid of all these errors I have to perform a truncate operation using the = cli, after which I get the same IndexOutOfBounds exception. Can I just shut = down the node (draining first), and delete all data files related to this = column family on the two problematic nodes? The data they contain is reasonably unimportant and I don=92t mind loosing it. ERROR [CompactionExecutor:1] 2010-12-06 05:07:56,736 AbstractCassandraDaemon.java (line 90) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:520) at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:340) at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.= jav a:57) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedSuper(ColumnFamily= Sto re.java:818) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(Column= Fam ilyStore.java:781) at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore= .ja va:774) at org.apache.cassandra.io.PrecompactedRow.(PrecompactedRow.java:93) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIter= ato r.java:138) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.= jav a:107) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.= jav a:42) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.= jav a:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131 ) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(Fil= ter Iterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIte= rat or.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.= jav a:321) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:1= 24) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:9= 7) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 08) at java.lang.Thread.run(Thread.java:662) Dan -----Original Message----- From: Jonathan Ellis [mailto:jbellis@gmail.com]=20 Sent: December-04-10 22:45 To: user Subject: Re: Various exceptions on 0.7 At least one of your nodes is sending garbage to the others. Either there's a bug in the bleeding edge code you are running (did you try rc1?) or you do have nodes on different versions or you have a hardware problem. On Sat, Dec 4, 2010 at 5:51 PM, Dan Hendry wrote: > Here are two other errors which appear frequently: > ERROR [MutationStage:29] 2010-12-04 17:47:46,931 RowMutationVerbHandler.java > (line 83) Error in row mutation > java.io.IOException: Invalid localDeleteTime read: 0 > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:3 55) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:3 12) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnF= ami lySerializer.java:129) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySe= ria lizer.java:120) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation= .ja va:383) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 93) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 51) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHand= ler .java:52) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java= :63 ) > =A0=A0 =A0 =A0 =A0at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja va:886) > =A0=A0 =A0 =A0 =A0at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 08) > =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662) > > ERROR [MutationStage:15] 2010-12-04 17:48:33,216 RowMutationVerbHandler.java > (line 83) Error in row mutation > org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: = invalid > column name length 0 > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.jav= a:6 8) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:3 63) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:3 12) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnF= ami lySerializer.java:129) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySe= ria lizer.java:120) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation= .ja va:383) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 93) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 51) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHand= ler .java:52) > =A0=A0 =A0 =A0 =A0at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java= :63 ) > =A0=A0 =A0 =A0 =A0at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja va:886) > =A0=A0 =A0 =A0 =A0at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 08) > =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662) > > On Sat, Dec 4, 2010 at 6:29 PM, Dan Hendry > wrote: >> >> No, all nodes are running very recent (< 2 day old) code out of the = 0.7 >> branch. This cluster has always had 0.7 RC1(+) code running on it >> >> On Sat, Dec 4, 2010 at 6:24 PM, Jonathan Ellis = wrote: >>> >>> Are you mixing different Cassandra versions? >>> >>> On Sat, Dec 4, 2010 at 4:58 PM, Dan Hendry = >>> wrote: >>> > To be clear, I had to interrupt a clean operation earlier in the = day be >>> > killing the cassandra process. Now the node works for awhile, >>> > but=A0continually=A0logging the "Error in row mutation" errors = then >>> > eventually >>> > logs a "Fatal exception in thread" error. After which, the process >>> > stays >>> > alive but there seem to be problems reading from the node. At the = very >>> > least, read performance is massively degraded. >>> > >>> > On Sat, Dec 4, 2010 at 5:52 PM, Dan Hendry = >>> > wrote: >>> >> >>> >> One of my Cassandra nodes is giving me a number of errors then >>> >> effectively >>> >> dying. I think it was somehow caused by interrupting a nodetool = clean >>> >> operation. Running a recent 0.7 build out of svn. >>> >> ERROR [MutationStage:26] 2010-12-04 16:23:04,395 >>> >> RowMutationVerbHandler.java (line 83) Error in row mutation >>> >> java.io.EOFException >>> >> =A0=A0 =A0 =A0 =A0at java.io.DataInputStream.readFully(DataInputStream.java:180) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:264= ) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.jav= a:7 6) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:3 63) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:3 12) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnF= ami lySerializer.java:129) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySe= ria lizer.java:120) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation= .ja va:383) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 93) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 51) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHand= ler .java:52) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java= :63 ) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja va:886) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 08) >>> >> =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [MutationStage:13] 2010-12-04 16:25:04,061 >>> >> RowMutationVerbHandler.java (line 83) Error in row mutation >>> >> org.apache.cassandra.db.UnserializableColumnFamilyException: = Couldn't >>> >> find >>> >> cfId=3D524288 >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySe= ria lizer.java:117) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation= .ja va:383) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 93) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.jav= a:3 51) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHand= ler .java:52) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java= :63 ) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja va:886) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 08) >>> >> =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [MutationStage:20] 2010-12-04 16:25:25,216 >>> >> DebuggableThreadPoolExecutor.java (line 103) Error in >>> >> ThreadPoolExecutor >>> >> java.lang.NullPointerException >>> >> =A0=A0 =A0 =A0 =A0at = org.apache.cassandra.db.Table.apply(Table.java:398) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHand= ler .java:73) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java= :63 ) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja va:886) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 08) >>> >> =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [MutationStage:20] 2010-12-04 16:25:25,216 >>> >> AbstractCassandraDaemon.java (line 90) Fatal exception in thread >>> >> Thread[MutationStage:20,5,main] >>> >> java.lang.NullPointerException >>> >> =A0=A0 =A0 =A0 =A0at = org.apache.cassandra.db.Table.apply(Table.java:398) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHand= ler .java:73) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java= :63 ) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .ja va:886) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:9 08) >>> >> =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662) >>> >> ERROR [COMMIT-LOG-WRITER] 2010-12-04 16:25:25,216 >>> >> AbstractCassandraDaemon.java (line 90) Fatal exception in thread >>> >> Thread[COMMIT-LOG-WRITER,5,main] >>> >> java.lang.RuntimeException: java.lang.NullPointerException >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >>> >> =A0=A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:662) >>> >> Caused by: java.lang.NullPointerException >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment= .ja va:92) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.= jav a:509) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runM= ayT hrow(PeriodicCommitLogExecutorService.java:52) >>> >> =A0=A0 =A0 =A0 =A0at >>> >> >>> >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >>> >> =A0=A0 =A0 =A0 =A0... 1 more >>> > >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of Riptano, the source for professional Cassandra support >>> http://riptano.com >> > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com No virus found in this incoming message. Checked by AVG - www.avg.com=20 Version: 9.0.872 / Virus Database: 271.1.1/3300 - Release Date: 12/06/10 02:34:00