cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out
Date Wed, 16 Sep 2015 09:22:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747201#comment-14747201
] 

Stefania commented on CASSANDRA-7392:
-------------------------------------

I've added the no spam logger and moved the logging to DEBUG, increasing the default max number
of queries that we report to 50. I know aggregation doesn't make much sense but it won't hurt,
in case an app sends the same query multiple times. 

I also prefer to re-introduce the CAS in {{MonitoringStateRef}}: if the worker thread does
not notice that the query was aborted it will carry on iterating which defeats the purpose
of aborting the queries. 

bq. They are usually going to be unique so there is nothing to aggregate on.

It's also worse than this. I didn't realize that the CQL string reconstructed from {{ReadCommand}}
is really an approximation as we don't have all the information there. For example, a query
without a condition on the primary key will be split in several queries as follows:

{code}
SELECT * FROM ks.test2 WHERE token(id) > -1976574744135038542 AND token(id) <= -1551387922747101229
LIMIT 5000: total time 10011 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 1096463829018333632 AND token(id) <= 1355062136393692257
LIMIT 5000: total time 10078 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 8977444122753183931 AND token(id) <= 9033798691964141178
LIMIT 5000: total time 10057 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 2635684107471725435 AND token(id) <= 2755551655031657904
LIMIT 5000: total time 10078 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 9080285075713538993 AND token(id) <= 9187108821678730728
LIMIT 5000: total time 10056 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > -8240319968209337270 AND token(id) <= -7817157413941317374
LIMIT 5000: total time 10032 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 8340735344052968255 AND token(id) <= 8546322458038003371
LIMIT 5000: total time 10057 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 5722969564085623706 AND token(id) <= 5806785306771146835
LIMIT 5000: total time 10073 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 7726207511295901422 AND token(id) <= 7839180972141923302
LIMIT 5000: total time 10058 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 3532380910529882202 AND token(id) <= 3654921169010564232
LIMIT 5000: total time 10074 msec - timeout 10000 msec
SELECT * FROM ks.test2 WHERE token(id) > 7881865912870825334 AND token(id) <= 7931494104861828509
LIMIT 5000: total time 10058 msec - timeout 10000 msec
{code}

A query with a condition only on non primary keys and {{ALLOW FILTERING}] will not be reported
as such since the filtering is not done on the worker. There may be other limitations.

To fix this we would either need to pass the original user query all the way to {{ReadCommand}},
which involves changing the serialization format and increases the memory footprint, or we
need to move the reporting to the coordinator, or we would have to rely on a table like we
do for tracing. Unless we try and squeeze the first option in before 3.0 hits, I think the
other two options are best dealt with in a separate ticket where we focus more on logging
rather than aborting queries.



> Abort in-progress queries that time out
> ---------------------------------------
>
>                 Key: CASSANDRA-7392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node is overloaded)
but not queries that time out while being processed.  (Particularly common for index queries
on data that shouldn't be indexed.)  Adding the latter and logging when we have to interrupt
one gets us a poor man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message