Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 548A3102E3 for ; Thu, 20 Jun 2013 04:50:31 +0000 (UTC) Received: (qmail 90467 invoked by uid 500); 20 Jun 2013 04:50:30 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 90430 invoked by uid 500); 20 Jun 2013 04:50:24 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 90193 invoked by uid 99); 20 Jun 2013 04:50:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jun 2013 04:50:21 +0000 Date: Thu, 20 Jun 2013 04:50:21 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-5668) NPE in net.OutputTcpConnection when tracing is enabled MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688866#comment-13688866 ] Jonathan Ellis commented on CASSANDRA-5668: ------------------------------------------- Okay, here's what's happening: {noformat} INFO [Thrift:1] 2013-06-19 23:36:51,719 Tracing.java (line 176) session 0702a620-d963-11e2-832d-53376523a4a2 is complete java.lang.AssertionError: Asked to trace TYPE:MUTATION VERB:MUTATION for session 0702a620-d963-11e2-832d-53376523a4a2 but that state does not exist {noformat} cqlsh is requesting QUORUM CL (or ONE?) so once that's achieved the coordinator sends success to the client and closes the tracing session. if other messages have not yet gone out, then we error. But it gets worse... Once the coordinator's state is discarded, any late-arriving replies will create a new, "non-local" session. Since the coordinator will not send any messages again for this session -- which is the trigger we use on replicas to indicate "we're done" -- the nonlocal session will persist indefinitely, "leaking" memory. I think we can solve both of these: # Make a static TraceState method that only needs the sessionid to be passed in to log an event. OTC can use this to avoid having to look up tracestate at all; if it's cleared out, not a problem. # Make Tracing.sessions an expiring map so sessions we don't clean up manually still get removed Alternatively we could just go with #2 by itself and not try to cleanup manually at all. Average case memory used will be worse, but maybe that is okay since we assume only a tiny fraction of requests are traced at all. What do you think [~slebresne]? > NPE in net.OutputTcpConnection when tracing is enabled > ------------------------------------------------------ > > Key: CASSANDRA-5668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5668 > Project: Cassandra > Issue Type: Bug > Affects Versions: 1.2.6, 2.0 beta 1 > Reporter: Ryan McGuire > Attachments: 5668-assert-2.txt, 5668-assert.txt, 5668-logs.tar.gz, 5668_npe_ddl.cql, 5668_npe_insert.cql, system.log > > > I get multiple NullPointerException when trying to trace INSERT statements. > To reproduce: > {code} > $ ccm create -v git:trunk > $ ccm populate -n 3 > $ ccm start > $ ccm node1 cqlsh < 5668_npe_ddl.cql > $ ccm node1 cqlsh < 5668_npe_insert.cql > {code} > And see many exceptions like this in the logs of node1: > {code} > ERROR [WRITE-/127.0.0.3] 2013-06-19 14:54:35,885 OutboundTcpConnection.java (line 197) error writing to /127.0.0.3 > java.lang.NullPointerException > at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:182) > at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:144) > {code} > This is similar to CASSANDRA-5658 and is the reason that npe_ddl and npe_insert are separate files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira