cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Alves (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-1123) Allow tracing query details
Date Mon, 20 Aug 2012 12:28:39 GMT


David Alves updated CASSANDRA-1123:

    Attachment: 1123.patch

Attached is a patch that implements almost everything mentioned:

- Trace is implemented asynchronously with a new Stage that has a threadpool with a single
thread that refuses to execute when the queue if full ( as suggested a warn is logged, experiments
say that under huge load 0.001 traceProbability still works without rejecting events). (also
this threadpool is the only one that does not propagate trace events).

- Allows to enable tracing/disable tracing from cli. Also enable tracing has two parameters
traceProbability (the prop that any single request from that client gets traced) and maxTraceNumber
to allow to set a maximum number of traces to do (-1 set this to Integer.MAX_INT which is
also the default)

- Adds the possibility to enable tracing in stress (using -tr probability [optionally maxNumTraces]

- TraceEvents can be build using a fluent builder (TraceEventBuilder) that is also able to
deserialize events (both from thrift and from IColumns).

- All requests in CassandraServer start a tracing session when tracing is enabled and all
parameters are stored along with the request details. This is done using a ThriftType that
is able to serialize and deserialize thrift objects into Cassandra.

- User Request/Reply, Stage Start/Finish and Message Request/Reply are traced along with specific
custom requests (such as apply_mutation, and get_column_family)

- Cli contains two other commands to explore traces:

show tracing summary [request_name] - this displays a summary for a request type:

Summary for sessions of request: batch_mutate
Total Sessions: 500
Total Events: 3190
                       |    Avg.    |   StdDev.  |   Max.     |    Min.    |     99%   | 
   Unit   |
| Latency              |      22.48 |      28.68 |     293.90 |       0.56 |     150.38| 
     msec |
| Batch Rows           |       1.00 |       0.00 |       1.00 |       1.00 |       1.00| amount/req
| Mutations            |       5.00 |       0.00 |       5.00 |       5.00 |       5.00| amount/req
| Deletions            |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req
| Columns              |       5.00 |       0.00 |       5.00 |       5.00 |       5.00| amount/req
| Counters             |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req
| Super Col.           |       0.00 |       0.00 |       0.00 |       0.00 |       0.00| amount/req
| Written Bytes        |     170.00 |       0.00 |     170.00 |     170.00 |     170.00| amount/req

Quickest Request sessionId: 135b0c60-eac0-11e1-0000-fe8ebeead9ff
Slowest  Request sessionId: f09c3e10-eabf-11e1-0000-fe8ebeead9ff

explain trace session [sessionId] - displays the complete set of events of a tracing session
along with a to-scale timeline for its execution.

Session Summary: f09c3e10-eabf-11e1-0000-fe8ebeead9ff
Request: batch_mutate
Coordinator: /
Interacting nodes: 2 {[/, /]}
Duration: 293901000 nano seconds (293.901 msecs)
Consistency Level: ONE
Request Timeline:



Se              - Session start (start of request: batch_mutate)
/Se             - Session end (end of request: batch_mutate)
St[StageName]   - Begin of Stage with StageName (e.g. Mutation)
/St[StageName]  - End of Stage with StageName
Msg             - Arrival of message from coordinator
/Msg            - Reply to message from coordinator
???             - Custom traces 

NOTE: timelines are drawn to scale but precision is limited to 100 positions.
 When multiple events occur in the same position (due to lack of precision) they are separated
by ','
 When two positions are too close together trace names are separated by ';'

Full event list (in order of occurrence)

Id: f09c3e11-eabf-11e1-0000-fe8ebeead9ff Name: batch_mutate           Type: SESSION_START
         Source: /             Timestamp (millis): 1345464589425          Duration
(nanos): 73000                 
Id: f09c8c30-eabf-11e1-0000-fe8ebeead9ff Name: MessageDeparture[47681] Type: MESSAGE_DEPARTURE
     Source: /             Timestamp (millis): 1345464589427          Duration (nanos):
Id: f0c63440-eabf-11e1-0000-7fd66bb03abf Name: MessageArrival[47681]  Type: MESSAGE_ARRIVAL
       Source: /             Timestamp (millis): 1345464589700          Duration
(nanos): 47000                 
Id: f0c745b0-eabf-11e1-0000-7fd66bb03abf Name: MutationStage          Type: STAGE_START  
         Source: /             Timestamp (millis): 1345464589707          Duration
(nanos): 7045000               
Id: f0c745b1-eabf-11e1-0000-7fd66bb03abf Name: apply_mutation         Type: MISC         
         Source: /             Timestamp (millis): 1345464589707          Duration
(nanos): 25000                 
Id: f0c793d0-eabf-11e1-0000-7fd66bb03abf Name: MutationStage          Type: STAGE_FINISH 
         Source: /             Timestamp (millis): 1345464589709          Duration
(nanos): 9617000               
Id: f0c91a70-eabf-11e1-0000-fe8ebeead9ff Name: SESSION_END            Type: SESSION_END  
         Source: /             Timestamp (millis): 1345464589719          Duration
(nanos): 293901000          

I timeline from a fully distributed deployment (this was from ccm) would be much more balanced
(the lower line would be bigger)


- Some parameters are hard-coded (such as the tracing queue, the ttl for trace events and
the number of trace events to fetch for the summaries). It might be better to configure these
somewhere but I've refrained from adding more config variables without consensus.

- There should be more traces.

- I implemented a distributed system keyspace to be used elsewhere but in the end reverted
to using the trace keyspace, reasoning was that a distributed system table might want different
things than a tracing system (replication factor for instance). Since tracing has its own
keyspace also kept CFMetadata there (Im glad to move it if it makes sense even without the
dist systable or if the consensus is that trace should be there)

- In order to display the summarizing cli command the commons math jar is required (for statistics)

- I ended up changing the schema to:

CREATE KEYSPACE trace WITH strategy_class = 'SimpleStrategy'
  AND strategy_options:replication_factor = '1';

CREATE TABLE trace.trace_events(sessionId  timeuuid,
  coordinator       inet,
  eventId          timeuuid,
  description       text,
  duration          bigint,
  happened_at          timestamp,
  name              text,
  payload_types     map<text, text>,
  payload           map<text, blob>,
  source            inet,
  type              text,
  PRIMARY KEY (sessionId, coordinator, eventId));

CREATE INDEX idx_name ON trace.trace_events (name);

Note the missing Sessions table (all the info there was redundant) and the added fields for

Finally I'm having issues using the index from thrift. Is there an incompatibility (i.e.,
using an IndexExpression with a CQL3 table?).

- Tests pass
> Allow tracing query details
> ---------------------------
>                 Key: CASSANDRA-1123
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2.0
>         Attachments: 1123-3.patch.gz, 1123.patch, 1123.patch
> In the spirit of CASSANDRA-511, it would be useful to tracing on queries to see where
latency is coming from: how long did row cache lookup take?  key search in the index?  merging
the data from the sstables?  etc.
> The main difference vs setting debug logging is that debug logging is too big of a hammer;
by turning on the flood of logging for everyone, you actually distort the information you're
looking for.  This would be something you could set per-query (or more likely per connection).
> We don't need to be as sophisticated as the techniques discussed in the following papers
but they are interesting reading:

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message