cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Overton <>
Subject tracing improvements
Date Wed, 25 Jan 2017 20:55:34 GMT
Hello cassandra-dev,

I would like to continue the momentum on improving Cassandra's tracing,
following Mick's excellent work on pluggable tracing and Zipkin support.

There are a couple of areas we can improve that would make tracing an even
useful tool for cluster operators to diagnose ongoing issues.

The control we currently have over tracing is coarse and somewhat
Enabling tracing from the client for a specific query is fine for
developers, particularly in an environment where Zipkin is being used to
all parts of the system and show an aggregated view. For an operator
investigating an issue however, this does not always give us the control
that we
need in order to obtain relevant data. We often need to diagnose an issue
without the possibility of making any changes in the client, and often
the prior knowledge of which queries at the application level are
poor performance.

Our only other instigator of tracing is nodetool settraceprobability which
affects a single node and gives us no control over precisely which queries
traced. In practise, it is very difficult to find the relevant queries that
want to investigate, so we have often resorted to bulk loading the traces
an external tool for analysis, and this seems sub-optimal when cassandra
reduce much of the friction.

I have a few proposals to improve tracing that I'd like to throw out to
the mailing list to get feedback before I start implementing.

1. Include trace_probability as a CF level property, so sampled tracing can
enabled on a per-CF basis, cluster-wide, by changing the CF property.

2. Allow tracing at the CFS level. If we have a misbehaving host, then it
be useful to enable sampled tracing at the CFS layer on just that host so
we can investigate queries landing on that replica, rather than just queries
passing through as a coordinator as is currently possible.

3. Add an interface allowing for custom filters which can decide whether
should be enabled for a given query. This is a similar idea to
[1] but following the same pattern that we have for IAuthenticator,
IEndpointSnitch, ConfigurationLoader et al. where the intention is that
default implementations are provided, but abstracted in such a way that
implementations can be written for deployments where a specific type of
functionality is required. This would then allow solutions such as
CASSANDRA-11012 [2] without any specific support needing to be written in

Thanks for reading!


[1] Facility to write
code to selectively trigger trace or log for queries

[2] Allow tracing CQL
of a
specific client only, based on IP (range)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message