cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13983) Support a means of logging all queries as they were invoked
Date Tue, 31 Oct 2017 21:31:00 GMT


Ariel Weisberg commented on CASSANDRA-13983:

The goals are a bit different. Using this as the basis for load testing is not the primary
goal although eventually it will be the basis for that as well. The primary goal is being
able to test correctness (defined as do these two versions return the same result) using your
actual data and queries.

The replay tool I created previously didn't have access to the values from the queries which
is what lead to this because I found it was very difficult to synthesize representative queries.
Once I have a log of the entire query I can adapt the a replay tool to replay the results
and compare them which wasn't a goal in CASSANDRA-6572. It's also something that isn't served
by using stress at all.

Stress can't synthesize the corner cases of data types, data values, data sizes, and actual
data generated by a previous version of Cassandra on disk.

> Support a means of logging all queries as they were invoked
> -----------------------------------------------------------
>                 Key: CASSANDRA-13983
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL, Observability, Testing, Tools
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 4.0
> For correctness testing it's useful to be able to capture production traffic so that
it can be replayed against both the old and new versions of Cassandra while comparing the
> Implementing this functionality once inside the database is high performance and presents
less operational complexity.
> In [this patch|] there is an implementation
of a full query log that logs uses chronicle-queue (apache licensed, the maven artifacts are
labeled incorrectly in some cases, dependencies are also apache licensed) to implement a rotating
log of queries.
> * Single thread asynchronously writes log entries to disk to reduce impact on query latency
> * Heap memory usage bounded by a weighted queue with configurable maximum weight sitting
in front of logging thread
> * If the weighted queue is full producers can be blocked or samples can be dropped
> * Disk utilization is bounded by deleting old log segments once a configurable size is
> * The on disk serialization uses a flexible schema binary format (chronicle-wire) making
it easy to skip unrecognized fields, add new ones, and omit old ones.
> * Can be enabled and configured via JMX, disabled, and reset (delete on disk data), logging
path is configurable via both JMX and YAML
> * Introduce new {{fqltool}} in /bin that currently implements {{Dump}} which can dump
in a human readable format full query logs as well as follow active full query logs
> Follow up work:
> * Introduce new {{fqltool}} command Replay which can replay N full query logs to two
different clusters and compare the result and check for inconsistencies. <- Actively working
on getting this done
> * Log not just queries but their results to facilitate a comparison between the original
query result and the replayed result. <- Really just don't have specific use case at the
> * "Consistent" query logging allowing replay to fully replicate the original order of
execution and completion even in the face of races (including CAS). <- This is more speculative

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message