cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5498) Possible NPE on EACH_QUORUM writes
Date Mon, 03 Jun 2013 18:54:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673426#comment-13673426
] 

Jason Brown commented on CASSANDRA-5498:
----------------------------------------

[~jjordan] working on it now on #cassandra-dev IRC. My suspicion is a problem with Gossiper.addSavedEndopint(),
which clears out the endpoint's previous data from the endpointStateMap when a node with a
greater messaging version attempts to connect. Which then causes the downstream affect in
DSWRH when it requests the DC data from the EC2Snitch, which gets it from Gossiper.endopintStateMap.

Here's the server-side stacktrace:

{code}ERROR [RPC-Thread:150339] 2013-05-08 17:29:55,048 Cassandra.java (line 3462) Internal
error processing batch_mutate 
java.lang.NullPointerException 
at org.apache.cassandra.service.DatacenterSyncWriteResponseHandler.assureSufficientLiveNodes(DatacenterSyncWriteResponseHandler.java:109)

at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:253) 
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:194) 
at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:639) 
at org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:590)

at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:598) 
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3454)

at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) 
at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)

at org.apache.cassandra.thrift.CustomTHsHaServer$Invocation.run(CustomTHsHaServer.java:105)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662){code}
                
> Possible NPE on EACH_QUORUM writes
> ----------------------------------
>
>                 Key: CASSANDRA-5498
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5498
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.10
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: each_quorum, ec2
>             Fix For: 1.2.6
>
>         Attachments: 5498-v1.patch, 5498-v2.patch
>
>
> When upgrading from 1.0 to 1.1, we observed that DatacenterSyncWriteResponseHandler.assureSufficientLiveNodes()
can throw an NPE if one of the writeEndpoints has a DC that is not listed in the keyspace
while one of the nodes is down. We observed this while running in EC2, and using the Ec2Snitch.
The exception typically was was brief, but a certain segment of writes (using EACH_QUORUM)
failed during that time.
> This ticket will address the NPE in DSWRH, while a followup ticket will be created once
we get to the bottom of the incorrect DC being reported from Ec2Snitch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message