cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5668) NPE in net.OutputTcpConnection when tracing is enabled
Date Thu, 20 Jun 2013 04:50:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688866#comment-13688866
] 

Jonathan Ellis commented on CASSANDRA-5668:
-------------------------------------------

Okay, here's what's happening:

{noformat}
 INFO [Thrift:1] 2013-06-19 23:36:51,719 Tracing.java (line 176) session 0702a620-d963-11e2-832d-53376523a4a2
is complete

java.lang.AssertionError: Asked to trace TYPE:MUTATION VERB:MUTATION for session 0702a620-d963-11e2-832d-53376523a4a2
but that state does not exist
{noformat}

cqlsh is requesting QUORUM CL (or ONE?) so once that's achieved the coordinator sends success
to the client and closes the tracing session.

if other messages have not yet gone out, then we error.

But it gets worse...

Once the coordinator's state is discarded, any late-arriving replies will create a new, "non-local"
session.  Since the coordinator will not send any messages again for this session -- which
is the trigger we use on replicas to indicate "we're done" -- the nonlocal session will persist
indefinitely, "leaking" memory.

I think we can solve both of these:
# Make a static TraceState method that only needs the sessionid to be passed in to log an
event.  OTC can use this to avoid having to look up tracestate at all; if it's cleared out,
not a problem.
# Make Tracing.sessions an expiring map so sessions we don't clean up manually still get removed

Alternatively we could just go with #2 by itself and not try to cleanup manually at all. 
Average case memory used will be worse, but maybe that is okay since we assume only a tiny
fraction of requests are traced at all.

What do you think [~slebresne]?
                
> NPE in net.OutputTcpConnection when tracing is enabled
> ------------------------------------------------------
>
>                 Key: CASSANDRA-5668
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5668
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.6, 2.0 beta 1
>            Reporter: Ryan McGuire
>         Attachments: 5668-assert-2.txt, 5668-assert.txt, 5668-logs.tar.gz, 5668_npe_ddl.cql,
5668_npe_insert.cql, system.log
>
>
> I get multiple NullPointerException when trying to trace INSERT statements.
> To reproduce:
> {code}
> $ ccm create -v git:trunk
> $ ccm populate -n 3
> $ ccm start
> $ ccm node1 cqlsh < 5668_npe_ddl.cql
> $ ccm node1 cqlsh < 5668_npe_insert.cql
> {code}
> And see many exceptions like this in the logs of node1:
> {code}
> ERROR [WRITE-/127.0.0.3] 2013-06-19 14:54:35,885 OutboundTcpConnection.java (line 197)
error writing to /127.0.0.3
> java.lang.NullPointerException
>         at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:182)
>         at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:144)
> {code}
> This is similar to CASSANDRA-5658 and is the reason that npe_ddl and npe_insert are separate
files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message