cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out
Date Fri, 11 Sep 2015 03:06:48 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740075#comment-14740075
] 

Stefania commented on CASSANDRA-7392:
-------------------------------------

Thanks for the detailed explanation on the gains of lazySet vs CAS, since we don't care about
a slight imprecision with logging, let's keep lazySet.

bq. I see options in the YAML as a bad thing and I also don't like undocumented options. I
also fear shipping with hard coded constants because it means there is no option other than
recompiling in the field.

bq. I see properties as bridging the gap between the next release where we either fix the
implementation so tuning the defaults isn't necessary or make it an option in the YAML if
we can't make it just work without operator intervention.

I think I misunderstood what you meant by property, I moved them from {{Config}} to static
values in {{MonitoringTask}} that are set via {{System.getProperty()}}.

bq. WRT to the default. I would say 1% of the timeout is pretty high precision. You do have
some insight there thinking about the frequency as a % of the timeout. I would say go with
that and set the check frequency as a % of the timeout unless overridden by a property, but
also set a minimum value. I think 50 milliseconds is good as a minimum. I would say 10% is
good enough so we would be off by around 1/10th of the timeout, but I don't feel strongly.

Sounds good, done. Enforcing a minimum of 50 milliseconds however slows down the unit tests
a bit, since it gets a bit messy to override the minimum as well. The trouble is that the
singleton is submitted for scheduling before we can change any class field. I could move the
properties to another class to make it a bit cleaner.

bq. That log message is fixed size (% of verbs) and covers all dropped messages and not just
the subset of timeouts you are working on right now. I would say leave it alone just by virtue
of being out of scope.

Ok.

--

I rebased and submitted these new changes and I've also added min/max/avg logging when we
have more than 1 timeout rather than just displaying the details of the first and last timeout.

--

bq. I think that's it. NoSpamLogger doesn't allow you to specify a policy for backoff as opposed
to fixed intervals. I think that is a missing capability. Have it support a policy, and then
tell you whether it is time to log so you can decide whether to clear the stats.

bq. One caveat that occurs to me of logging this kind of thing at a variable rate is that
absolute counts of events are no longer informative. You need to log a rate so you can compare
without having to do your own math. Even then there is some harm because you could grow the
reporting interval to include time where nothing is going wrong distorting the reported rate.
I think there is some tension between my pony and providing precise data. What do you think?

How would we calculate the rate without also storing the totals? I'm not sure variable rate
logging is the best way to go about it given that we are trying to achieve a _poor man's "slow
query log" for free_. The issue is how to avoid polluting log files, so the effort required
to support variable rate logging would perhaps be better spent logging the timed-out queries
elsewhere?

[~jbellis] and [~iamaleksey] WDYT?

> Abort in-progress queries that time out
> ---------------------------------------
>
>                 Key: CASSANDRA-7392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node is overloaded)
but not queries that time out while being processed.  (Particularly common for index queries
on data that shouldn't be indexed.)  Adding the latter and logging when we have to interrupt
one gets us a poor man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message