Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 60093 invoked from network); 3 Sep 2010 20:51:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Sep 2010 20:51:58 -0000 Received: (qmail 73428 invoked by uid 500); 3 Sep 2010 20:51:58 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 73409 invoked by uid 500); 3 Sep 2010 20:51:57 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 73400 invoked by uid 99); 3 Sep 2010 20:51:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 20:51:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 20:51:56 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o83KpaId025331 for ; Fri, 3 Sep 2010 20:51:36 GMT Message-ID: <23320801.19801283547096251.JavaMail.jira@thor> Date: Fri, 3 Sep 2010 16:51:36 -0400 (EDT) From: "David King (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Created: (CASSANDRA-1463) Failed bootstrap can cause NPE in batch_mutate on every node, taking down the entire cluster MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Failed bootstrap can cause NPE in batch_mutate on every node, taking down the entire cluster -------------------------------------------------------------------------------------------- Key: CASSANDRA-1463 URL: https://issues.apache.org/jira/browse/CASSANDRA-1463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6.5 Reporter: David King In adding a node to the cluster, the bootstrap failed (still investigating the cause). An hour later, the entire cluster failed, preventing any writes from being accepted. This exception started being printed to the logs: {quote} INFO [Timer-0] 2010-09-03 12:23:33,282 Gossiper.java (line 402) FatClient /10.251.243.191 has been silent for 3600000ms, removing from gossip ERROR [Timer-0] 2010-09-03 12:23:33,318 Gossiper.java (line 99) Gossip error java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1048) at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383) at org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93) at java.util.TimerThread.mainLoop(Timer.java:534) at java.util.TimerThread.run(Timer.java:484) ERROR [pool-1-thread-69153] 2010-09-03 12:23:33,857 Cassandra.java (line 1659) Internal error processing batch_mutate java.lang.NullPointerException at org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135) at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85) at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) ERROR [pool-1-thread-69154] 2010-09-03 12:23:33,869 Cassandra.java (line 1659) Internal error processing batch_mutate java.lang.NullPointerException at org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135) at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85) at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {quote} After a large number of iterations of that (at least thousands), the printed exception was shortened (this shortening is what made me mistakenly file #1462) to {quote} ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,857 Cassandra.java (line 1659) Internal error processing batch_mutate java.lang.NullPointerException ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,883 Cassandra.java (line 1659) Internal error processing batch_mutate java.lang.NullPointerException ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,894 Cassandra.java (line 1659) Internal error processing batch_mutate java.lang.NullPointerException ERROR [pool-1-thread-68970] 2010-09-03 12:39:22,985 Cassandra.java (line 1659) Internal error processing batch_mutate java.lang.NullPointerException ERROR [pool-1-thread-68970] 2010-09-03 12:39:23,084 Cassandra.java (line 1659) Internal error processing batch_mutate java.lang.NullPointerException {quote} Rolling a restart over the cluster fixed it, but every node had to be restarted before it started accepting writes again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.