cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David King (JIRA)" <j...@apache.org>
Subject [jira] Created: (CASSANDRA-1463) Failed bootstrap can cause NPE in batch_mutate on every node, taking down the entire cluster
Date Fri, 03 Sep 2010 20:51:36 GMT
Failed bootstrap can cause NPE in batch_mutate on every node, taking down the entire cluster
--------------------------------------------------------------------------------------------

                 Key: CASSANDRA-1463
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1463
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.6.5
            Reporter: David King


In adding a node to the cluster, the bootstrap failed (still investigating the cause). An
hour later, the entire cluster failed, preventing any writes from being accepted. This exception
started being printed to the logs:

{quote}
 INFO [Timer-0] 2010-09-03 12:23:33,282 Gossiper.java (line 402) FatClient /10.251.243.191
has been silent for 3600000ms, removing from gossip
ERROR [Timer-0] 2010-09-03 12:23:33,318 Gossiper.java (line 99) Gossip error
java.util.ConcurrentModificationException
        at java.util.Hashtable$Enumerator.next(Hashtable.java:1048)
        at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383)
        at org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93)
        at java.util.TimerThread.mainLoop(Timer.java:534)
        at java.util.TimerThread.run(Timer.java:484)
ERROR [pool-1-thread-69153] 2010-09-03 12:23:33,857 Cassandra.java (line 1659) Internal error
processing batch_mutate
java.lang.NullPointerException
        at org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135)
        at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85)
        at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
        at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415)
        at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651)
        at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
ERROR [pool-1-thread-69154] 2010-09-03 12:23:33,869 Cassandra.java (line 1659) Internal error
processing batch_mutate
java.lang.NullPointerException
        at org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135)
        at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85)
        at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
        at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415)
        at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651)
        at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166)
        at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
{quote}

After a large number of iterations of that (at least thousands), the printed exception was
shortened (this shortening is what made me mistakenly file #1462) to

{quote}
ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,857 Cassandra.java (line 1659) Internal error
processing batch_mutate
java.lang.NullPointerException
ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,883 Cassandra.java (line 1659) Internal error
processing batch_mutate
java.lang.NullPointerException
ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,894 Cassandra.java (line 1659) Internal error
processing batch_mutate
java.lang.NullPointerException
ERROR [pool-1-thread-68970] 2010-09-03 12:39:22,985 Cassandra.java (line 1659) Internal error
processing batch_mutate
java.lang.NullPointerException
ERROR [pool-1-thread-68970] 2010-09-03 12:39:23,084 Cassandra.java (line 1659) Internal error
processing batch_mutate
java.lang.NullPointerException
{quote}

Rolling a restart over the cluster fixed it, but every node had to be restarted before it
started accepting writes again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message