cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Bailis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor
Date Tue, 14 May 2013 23:50:13 GMT


Peter Bailis commented on CASSANDRA-5455:

I've thought some more about different options for enabling metrics that are useful to both
PBS (in an external module, if committers prefer) and anyone else who would be interested
in finer-grained tracing.

To start, I *do* think that there is interest in a PBS module: if an eventually consistent
store is returning stale data, how stale *is* it? Especially given that many (most?) Cassandra
client libraries (including the Datastax java-driver) choose CL=ONE by default, I'd expect
most users would prefer to understand how their choice of N,R, and W affects their latency
and consistency.

I've been contacted by several Cassandra users who are interested in and/or using this functionality
and understand that several developers are interested in PBS for Riak (notably, Andy Gross
highlighted PBS in his 2013 RICON East keynote as a useful feature Basho would like). We originally
chose Cassandra based on our familiarity with the code base and on early discussions with
Jonathan but we plan to integrate PBS functionality into Riak with the help of their committers
in the near-term future. So I do think there is interest, and, if you're curious about *use
cases* for this functionality, Shivaram and I will be demoing PBS in Cassandra 1.2 at the
upcoming SIGMOD 2013 conference. Our demo proposal sketches three application vignettes, including
the obvious integration with monitoring tools but also automatically tuning N,R, and W and
and providing consistency and latency SLAs:

So, on the more technical side, there are two statistics that aren't currently measured (in
trunk) that are required for accurate PBS predictions. First, PBS requires per-server statistics.
Currently, the ColumnFamily RTT read/write latency metrics are aggregated across all servers.
Second, PBS requires a measure how how long a read/write request takes before it is processed
(i.e., how long it took from a client sending  each read/write request to when it was performed).
This requires knowledge of one-way request latencies as well as read/write request-specific

The 1.2 PBS patch provided both of these, aggregating by server and measuring the delay until
processing. As Jonathan notes above, the latter measurement was conservative--the remote replica
recorded the time that it enqueued its response rather than the exact moment a read or write
was performed, namely for simplicity of code. The coordinating server could then closely approximate
the return time as RTT-(remote timestamp).

Given these requirements and the current state of trunk, there are a few ways forward to support
an external PBS prediction module:

1a.) Modify Cassandra to store latency statistics on a per-server and per-ColumnFamily granularity.
As Rick Branson has pointed out, this is actually useful for monitoring other than PBS and
can be used to detect slower replicas.

1b.) Modify Cassandra to store local processing times for requests (i.e., expand StorageMetrics,
which currently does not track the time required to, say, fulfill a local read stage). This
also has the benefit of understanding whether a Cassandra node is slow due to network or disk.

2.) Use the newly developed tracing functionality to reconstruct latencies for selected requests.
Performing any sort of profiling will require tracing to be enabled (this appears to be somewhat
heavyweight given the amount of data that is logged for each request , and reconstructing
latencies from the trace table may be expensive (i.e., amount to a many-way self-join).

3.) Use RTT/2 based on ColumnFamily LatencyMetrics as an inaccurate but already supported
external predictor.

4.) Leave the PBS latency sampling as in 1.2 but remove the PBS predictor code. Expose the
latency samples via an Mbean for users like Rick who would benefit from it.

Proposal #1 has benefits for many users and seems a natural extension to the existing metrics
but requires changes to the existing code. Proposal #2 puts substantial burden on an end-user
and, without a fixed schema for the trace table, may amount to a fair bit of code munging.
Proposal #3 is inaccurate but works on trunk. Proposal #4 is essentially 1.2.0 without the
requirement to maintain any PBS-specific code and is a reasonable stop-gap before proposal
#1. All of these proposals are amenable to sampling.

I'd welcome your feedback on these proposals and next steps.
> Remove PBSPredictor
> -------------------
>                 Key: CASSANDRA-5455
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 2.0
>         Attachments: 5455.txt
> It was a fun experiment, but it's unmaintained and the bar to understanding what is going
on is high.  Case in point: PBSTest has been failing intermittently for some time now, possibly
even since it was created.  Or possibly not and it was a regression from a refactoring we
did.  Who knows?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message