cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12403) Slow query detecting
Date Wed, 10 Aug 2016 09:13:20 GMT


Stefania commented on CASSANDRA-12403:

Hello Hoshii-san

Thank you for the patch. The approach is correct, you understood exactly what I meant in my
previous comment. 

I created a branch with the initial commit and some suggestions in a follow-up commit. I've
also started the tests, the results are still pending:


The suggestions can be summarized as follows:

* {{logging_slow_query_threshold_in_ms}} in cassandra.yaml was renamed to {{query_log_timeout_in_ms}}
and the default value increased from 200 to 500 milliseconds; I know this is very high but
I prefer to be on the safe side, WDYT?
* The methods associated with the retrieval of the configuration property have been renamed
for consistency with the new yaml property name.
* With the initial version of the patch, the same query would have been potentially added
to the slow queries queue several times, I've changed {{MonitorableImpl}} so that we log a
slow query only when it is completed. This has the advantage that we have the accurate total
time for the query, and that we do not log queries that time out twice.
* Given the change above, it was then possible to refactor further {{MonitorTask}}, which
I did. There was also a problem in that the slow queries queue was bounded but we did not
report in the log message that some slow query logs were missing due to a full queue (via
{{...}} in the message), the refactoring should have fixed this. I've also added some comments
(unrelated to this patch, they should already have been there). 
* I think we can log both slow and timed out queries with the same scheduling task.
* The unit tests have been changed to reflect the points above.
* In {{MonitorTask}} the summary log message for slow queries was downgraded from WARN to
INFO, and the logger wrapped with {{NoSpamLogger}}, which will log the summary message at
most once every 5 minutes. {{NoSpamLogger}} is also used for the summary log message for timed-out
queries, although this is still at WARN level. However, detailed logs will still be logged
every time at DEBUG level in debug.log.

Finally, I've created some distributed tests [here|].
You can run them to see what the log messages will look like. Here is a sample:

INFO  [ScheduledTasks:1] 2016-08-10 16:56:50,209 - Some operations were
slow, details available at debug level (debug.log)
DEBUG [ScheduledTasks:1] 2016-08-10 16:56:50,211 - 3 operations were
slow in the last 4998 msecs:
<SELECT * FROM ks.test2 LIMIT 5000>, time 3026 msec - slow timeout 30 msec
<SELECT * FROM ks.test2 WHERE id = 1 LIMIT 5000>, was slow 2 times: avg/min/max 330/325/335
msec - slow timeout 30 msec
<SELECT * FROM ks.test2 WHERE token(id) > 0 LIMIT 5000>, time 1449 msec - slow timeout
30 msec

The INFO message appears in system.log and debug.log but not more than once every 5 minutes.
The DEBUG log appears every 5 seconds (configurable) if there is at least a slow query.

Could you review the follow-up commit and let us know what you think? Please do not hesitate
to express any concerns, especially regarding the logging choices, since they are intended
for operators.

> Slow query detecting
> --------------------
>                 Key: CASSANDRA-12403
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Shogo Hoshii
>            Assignee: Shogo Hoshii
>         Attachments: sample.log, slow_query.patch, slow_query.patch
> Hello,
> In cassandra production environment, users sometimes build anti-pattern tables and throw
queries in inefficient manners.
> So I would like to suggest a feature that enables to log slow query.
> The feature can help cassandra operators to identify bad query patterns.
> Then operators can give advices about queries and data model to users who don't know
cassandra so much.
> This ticket is related to CASSANDRA-6226, and I focus on detecting bad query patterns,
not aborting them.

This message was sent by Atlassian JIRA

View raw message