Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4908C173C7 for ; Mon, 28 Sep 2015 21:12:05 +0000 (UTC) Received: (qmail 54617 invoked by uid 500); 28 Sep 2015 21:12:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 54583 invoked by uid 500); 28 Sep 2015 21:12:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 54570 invoked by uid 99); 28 Sep 2015 21:12:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2015 21:12:05 +0000 Date: Mon, 28 Sep 2015 21:12:05 +0000 (UTC) From: "Ariel Weisberg (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-7392) Abort in-progress queries that time out MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933983#comment-14933983 ] Ariel Weisberg edited comment on CASSANDRA-7392 at 9/28/15 9:11 PM: -------------------------------------------------------------------- * Use a dedicated thread to update the timestamp so it isn't impacted by other activities * I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. * I think the timestamp field in ApproximateTime needs to be volatile. * Several properties don't have the "cassandra." prefix * By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. * I think you want a count of operations that were truncated instead of a boolean so you can log the count. * [Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126] * More bike shedding, when aggregating I would just allocate the map each time rather than clear it. * I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. * [I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257] * [More bike shedding, you can implement min and max as "oldValue = Math.min(oldValue, nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259] * [Can you humor me and for Monitorable boolean checks rename to isXYZ and for things that might change it leave as is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53] * [I think failedAt is unused now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223] * [If we use approximate time for timeouts can we also use it for setting the construction time?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2603bbfead4cdd58e1e08b225338bda0R28] was (Author: aweisberg): * Use a dedicated thread to update the timestamp so it isn't impacted by other activities * I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. * I think the timestamp field in ApproximateTime needs to be volatile. * Several properties don't have the "cassandra." prefix * By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. * I think you want a count of operations that were truncated instead of a boolean so you can log the count. * [Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126] * More bike shedding, when aggregating I would just allocate the map each time rather than clear it. * I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. * [I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257] * [More bike shedding, you can implement min and max as "oldValue = Math.min(oldValue, nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259] * [Can you humor me and for Monitorable boolean checks rename to isXYZ and for things that might change it leave as is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53] * [I think failedAt is unused now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223] > Abort in-progress queries that time out > --------------------------------------- > > Key: CASSANDRA-7392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7392 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Stefania > Priority: Critical > Fix For: 3.x > > > Currently we drop queries that time out before we get to them (because node is overloaded) but not queries that time out while being processed. (Particularly common for index queries on data that shouldn't be indexed.) Adding the latter and logging when we have to interrupt one gets us a poor man's "slow query log" for free. -- This message was sent by Atlassian JIRA (v6.3.4#6332)