cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lohfink (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-7974) Enable tooling to detect hot partitions
Date Sat, 25 Oct 2014 01:10:34 GMT


Chris Lohfink updated CASSANDRA-7974:
    Attachment: cassandra-2.1-7974v2.txt

I attached a version with a few extras:
* Includes sampling of writes 
* Expose the partition type in JMX so that nodetool can serialize the blobs as strings
* Include the margin of error from the summary
* Defaults for capacity and topK count to make it simpler to use, allows overriding either
with options
** not setting capacity to topK count since summary becomes very inaccurate if cardinality
vastly exceeds capacity (in case where capacity=10 a cardinality of just 100 would be very
inaccurate in a lot of loads)
** print out the estimated cardinality (using hyperloglog) so that its easier to identify
what an appropriate capacity will be if margin of error unacceptable
* make it so if sampling disabled theres no blocking (as opposed to synchronizing addSample)
** also make case where sampling being enabled is non-blocking
* made it easy to add additional samplers, I would like to add a "columns count" or "size"
sampler as well

output looks like:
READ Sampler:
  Cardinality: ~235 (256 capacity used)
  Top 10 partitions:
	Partition                        Count       +/-
	4BpaP7j05i:true                      1         0
	jSvq6b62uXwfQb:true                  1         0
	BvkRbLI1rKO:true                     1         0

WRITE Sampler:
  Cardinality: ~4681 (256 capacity used)
  Top 10 partitions:
	Partition                          Count       +/-
	jXyI4PpocdtXAkvxG8geS1bkY:true        49        10
	bid3tbjRKzDZ4l5Wu:true                29        12
	cWti3ryllghSxOGEuG:true               19        18

> Enable tooling to detect hot partitions
> ---------------------------------------
>                 Key: CASSANDRA-7974
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>         Attachments: 7974.txt, cassandra-2.1-7974v2.txt
> Sometimes you know you have a hot partition by the load on a replica set, but have no
way of determining which partition it is.  Tracing is inadequate for this without a lot of
post-tracing analysis that might not yield results.  Since we already include stream-lib for
HLL in compaction metadata, it shouldn't be too hard to wire up topK for X seconds via jmx/nodetool
and then return the top partitions hit.

This message was sent by Atlassian JIRA

View raw message