cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "NodeTool" by IanDanforth
Date Wed, 10 Aug 2011 17:22:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "NodeTool" page has been changed by IanDanforth:

  == Scrub ==
  Cassandra v0.7.1 and v0.7.2 shipped with a bug that caused incorrect row-level bloom filters
to be generated when compacting sstables generated with earlier versions.  This would manifest
in IOExceptions during column name-based queries.  v0.7.3 provides "nodetool scrub" to rebuild
sstables with correct bloom filters, with no data lost. (If your cluster was never on 0.7.0
or earlier, you don't have to worry about this.)  Note that nodetool scrub will snapshot your
data files before rebuilding, just in case.
+ == Cfhistograms ==
+ Excellent description from:
+ The output of the command has following 6 columns:
+  * Offset
+  * SSTables
+  * Write Latency
+  * Read Latency
+  * Row Size
+  * Column Count
+ === Interpreting the output ===
+  * Offset: This represents the series of values to which the counts for below 5 columns
correspond. This corresponds to the X axis values in histograms. The unit is determined based
on the other columns.
+  * SSTables: This represents the number of SSTables accessed per read. For eg if a read
operation involved accessing 3 SSTables then you will find a +ve value against Offset 3. The
values are recent i.e. for duration lapsed between two calls.
+  * Write Latency: This shows the distribution of number of operations across the range of
Offset values representing latency in microseconds. For eg. If 100 operations took say 5 ms
then you will find a +ve value against offset 5.
+  * Read Latency: This is similar to write latency. The values are recent i.e. for duration
lapsed between two calls.
+  * Row Size: This shows the distribution of rows across the range of Offset values representing
size in bytes. For eg. If you have 100 rows of size 2000bytes then you will find a +ve value
against offset 2000.
+  * Column Count: This is similar to row size. The offset values represent column count.
+ === Some additional details ===
+ Typically in a histogram the values are plotted over discrete intervals. Similarly Cassandra
defines buckets. The number of buckets is 1 more than the bucket offsets. The last element
is values greater than the last offset. The values you see in the Offset column in the output
is bucket offsets.
+ The bucket offset starts at 1 and grows by 1.2 each time (rounding and removing duplicates).
It goes from 1 to around 36M by default (creating 90+1 buckets), which will give us timing
resolution from microseconds to 36 seconds, with less precision as the numbers get larger.
(see EstimatedHistogram class)

View raw message