cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Morton (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-3548) NPE in AntiEntropyService$RepairSession.completed()
Date Thu, 01 Dec 2011 11:36:39 GMT
NPE in AntiEntropyService$RepairSession.completed()
---------------------------------------------------

                 Key: CASSANDRA-3548
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3548
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.1
         Environment: Free BSD 8.2, JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0
            Reporter: Aaron Morton
            Assignee: Aaron Morton
            Priority: Minor


This may be related to CASSANDRA-3519 (cluster it was observed on is still 1.0.1), however
i think there is still a race condition.

Observed on a 2 DC cluster, during a repair that spanned the DC's.  

{noformat}
INFO [AntiEntropyStage:1] 2011-11-28 06:22:56,225 StreamingRepairTask.java (line 136) [streaming
task #69187510-1989-11e1-0000-5ff37d368cb6] Forwarding streaming repair of 8602 
ranges to /10.6.130.70 (to be streamed with /10.37.114.10)
...
 INFO [AntiEntropyStage:66] 2011-11-29 11:20:57,109 StreamingRepairTask.java (line 253) [streaming
task #69187510-1989-11e1-0000-5ff37d368cb6] task succeeded
ERROR [AntiEntropyStage:66] 2011-11-29 11:20:57,109 AbstractCassandraDaemon.java (line 133)
Fatal exception in thread Thread[AntiEntropyStage:66,5,main]
java.lang.NullPointerException
        at org.apache.cassandra.service.AntiEntropyService$RepairSession.completed(AntiEntropyService.java:712)
        at org.apache.cassandra.service.AntiEntropyService$RepairSession$Differencer$1.run(AntiEntropyService.java:912)
        at org.apache.cassandra.streaming.StreamingRepairTask$2.run(StreamingRepairTask.java:186)
        at org.apache.cassandra.streaming.StreamingRepairTask$StreamingRepairResponse.doVerb(StreamingRepairTask.java:255)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
{noformat}

One of the nodes involved in the repair session failed, e.g. (Not sure if this is from the
same repair session as the streaming task above, but it illustrates the issue)

{noformat}
ERROR [AntiEntropySessions:1] 2011-11-28 19:39:52,507 AntiEntropyService.java (line 688) [repair
#2bf19860-197f-11e1-0000-5ff37d368cb6] session completed with the following error
java.io.IOException: Endpoint /10.29.60.10 died
        at org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:725)
        at org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:762)
        at org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:192)
        at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:559)
        at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:62)
        at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:167)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
ERROR [GossipTasks:1] 2011-11-28 19:39:52,507 StreamOutSession.java (line 232) StreamOutSession
/10.29.60.10 failed because {} died or was restarted/removed
ERROR [GossipTasks:1] 2011-11-28 19:39:52,571 Gossiper.java (line 172) Gossip error
java.util.ConcurrentModificationException
        at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:782)
        at java.util.ArrayList$Itr.next(ArrayList.java:754)
        at org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:190)
        at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:559)
        at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:62)
        at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:167)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)

{noformat}

When a node is marked as failed AntiEntropyService.RepairSession.forceShutdown() clears the
activejobs map. But the jobs to other nodes will continue, and will eventually call completed().


RepairSession.terminated should stop completed() from checking the map, but there is a race
between the map been cleared and if there is an error in finally block it wont be set. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message