From user-return-14204-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Mar 04 17:32:53 2011 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 99321 invoked from network); 4 Mar 2011 17:32:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Mar 2011 17:32:53 -0000 Received: (qmail 13859 invoked by uid 500); 4 Mar 2011 17:32:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13830 invoked by uid 500); 4 Mar 2011 17:32:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13795 invoked by uid 99); 4 Mar 2011 17:32:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2011 17:32:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stinkymatt@gmail.com designates 209.85.218.44 as permitted sender) Received: from [209.85.218.44] (HELO mail-yi0-f44.google.com) (209.85.218.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2011 17:32:44 +0000 Received: by yic13 with SMTP id 13so1011796yic.31 for ; Fri, 04 Mar 2011 09:32:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=yVd7+SiIyA6DXzJ7uaDXVwT8/Kl9vzKtIvcUgNx8ogw=; b=QKiRo7avok9WosiT6W7s/9Qwl7SmEJc2qBmQIopl13wYgEM1qIdvc+EZUpHpOWi31N fjo3W/qmk3p74TSWVgFNlfdSTdSPJc1dqeG0JUBpeuQJc+owlfF/f+SRVtbSGLsbSp4I tU0n4v2b2nLxvvxnh+oyknChRjapygfc/ODFA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=jmRLor8K7xrz/GZxP6jwD7tOA/NYt8d8Qhg6zwgNtAEfFQ5TVVJz0SKqmrNGKkWuZy Ap0Kqo1VYYYnKmsVz6otMydhC3+3OFFbTiWjaR3yPhee9HX6PNXwWl2aJJ9oZBv4rS6M vQJj9KfQKawaiCXx35vcHyLE16I1rFs9/B59s= MIME-Version: 1.0 Received: by 10.151.77.31 with SMTP id e31mr846754ybl.435.1299259943761; Fri, 04 Mar 2011 09:32:23 -0800 (PST) Received: by 10.150.199.16 with HTTP; Fri, 4 Mar 2011 09:32:23 -0800 (PST) Date: Fri, 4 Mar 2011 12:32:23 -0500 Message-ID: Subject: Cluster not starting up From: Matt Kennedy To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd631006b373f049dab8717 --000e0cd631006b373f049dab8717 Content-Type: text/plain; charset=ISO-8859-1 I'm currently the proud owner of an 8-node cluster that won't start up. Yesterday we had a developer doing very high volume writes to our cluster via a Hadoop job that was reading an HDFS file and running six concurrent mappers on each of 8 nodes and using Hector to do the load and it sort of killed Cassandra. It was running 0.7.0 and actually killed three of the nodes with OutOfMemory errors before he realized something was awry and killed the job. He then tried to get rid of the keyspace by dropping it in the CLI and got the following error: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=devks,columnfamily=OriginCF So he punted to me, and I decided to just try restarting the cluster in the hopes that it would sort itself out. The nodes that were still up died gracefully with the stop-server command, no kill -9s required. But when I tried to start the nodes again, they all failed with stack traces. My googling led me to this: https://issues.apache.org/jira/browse/CASSANDRA-2197 So I upgraded to 0.7.2 and tried restarting, once again all the nodes fail with two different stack traces, but both types occur immediately after an INFO message of the form: INFO 12:06:26,979 Finished reading /path/to/commitlog/etc/CommitLog-NNNNNNNN.log The stack traces are one of: Exception encountered during startup. java.io.IOError: java.io.EOFException at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:246) ... or Exception encountered during startup. java.lang.NullPointerException at org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:318) ... Fortunately, I have the luxury of clearing out the data in the cluster, but I'd like a more elegant option than that. Anybody have any suggestions? Thanks, Matt --000e0cd631006b373f049dab8717 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm currently the proud owner of an 8-node cluster that won't start= up.

Yesterday we had a developer doing very high volume writes to o= ur cluster via a Hadoop job that was reading an HDFS file and running six c= oncurrent mappers on each of 8 nodes and using Hector to do the load and it= sort of killed Cassandra.=A0 It was running 0.7.0 and actually killed thre= e of the nodes with OutOfMemory errors before he realized something was awr= y and killed the job.=A0 He then tried to get rid of the keyspace by droppi= ng it in the CLI and got the following error:

javax.management.InstanceAlreadyExistsException: org.apache.cassandra.d= b:type=3DColumnFamilies,keyspace=3Ddevks,columnfamily=3DOriginCF

So = he punted to me, and I decided to just try restarting the cluster in the ho= pes that it would sort itself out.=A0 The nodes that were still up died gra= cefully with the stop-server command, no kill -9s required.=A0 But when I t= ried to start the nodes again, they all failed with stack traces.

My googling led me to this: https://issues.apache.org/jira/browse/CASSANDRA-2197<= /a>

So I upgraded to 0.7.2 and tried restarting, once again all the = nodes fail with two different stack traces,=A0 but both types occur immedia= tely after an INFO message of the form:

INFO 12:06:26,979 Finished reading /path/to/commitlog/etc/CommitLog-NNN= NNNNN.log

The stack traces are one of:

Exception encountered = during startup.
java.io.IOError: java.io.EOFException
=A0=A0=A0 at or= g.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.j= ava:246)
...

or

Exception encountered during startup.
java.lang.Nul= lPointerException
=A0=A0=A0 at org.apache.cassandra.db.Table.createRepli= cationStrategy(Table.java:318)
...

Fortunately, I have the luxury= of clearing out the data in the cluster, but I'd like a more elegant o= ption than that.=A0 Anybody have any suggestions?

Thanks,
Matt
--000e0cd631006b373f049dab8717--