On Mon, Dec 31, 2012 at 11:24 AM, James Masson <james.masson@opigram.com> wrote:

Well, it turns out the Read-Request Latency graph in Ops-Center is highly misleading.

Using jconsole, the read-latency for the column family in question is actually normally around 800 microseconds, punctuated by occasional big spikes that drive up the averages.

Towards the end of the batch process, the Opscenter reported average latency is up above 4000 microsecs, and forced compactions no longer help drive the latency down again.

I'm going to stop relying on OpsCenter for data for performance analysis metrics, it just doesn't have the resolution.

James, it's worth pointing out that Read Request Latency in OpsCenter is measuring at the coordinator level, so it includes the time spent sending requests to replicas and waiting for a response.  There's another latency metric that is per-column family named Local Read Latency; it sounds like this is the equivalent number that you were looking at in jconsole.  This metric basically just includes the time to read local caches/memtables/sstables.

We are looking to rename one or both of the metrics for clarity; any input here would be helpful. For example, we're considering "Coordinated Read Request Latency" or "Client Read Request Latency" in place of just "Read Request Latency".

Tyler Hobbs