cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)
Date Thu, 21 Mar 2013 22:39:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609625#comment-13609625
] 

Benedict commented on CASSANDRA-2698:
-------------------------------------

Hi,

I've uploaded a patch for this issue (patch.diff - apologies for the potentially future-clashing
name). Logging is performed in two places, both on the source (not requesting) node of any
comparison:

1) On the requesting node in AntiEntropyService.Difference.run(), after the MerkleTree difference
is calculated and before the StreamingRepairTask is created
2) On the source node, on which StreamingRepairTask is run, in StreamOut.createPendingFiles()

In both cases we log, at debug level, a sample of the largest ranges followed by a histogram
of the range size distribution.  The first is achieved by inserting each range directly into
an EstimatedHistogram, on which we call the new logSummary() method; the second by calling
the new groupByFrequency() method on that same histogram, to yield a histogram based on the
frequency of sizes present in the original (on which we simply call log()).

In case 1, we construct the MerkleTree to include a size taken from the AbstractCompactedRow
we compute the hash from, and use this in MerkleTree.difference to estimate the size of mismatching
ranges. This tends to underestimate, versus that reported by StreamOut, by around 15%. One
design decision of note here: instead of modifying AbstractCompactedRow to return a size (which
would be invasive and in some cases incur an unnecessary penalty) we use a custom implementation
of MessageDigest that counts the number of bytes provided to it.

Case 2 is much simpler, as we already have the ranges and their sizes available to us.

There are some other changes, particularly in MerkleTree, with some refactoring/renames/new
subclasses as part of updating MerkleTree.difference(). In particular, TreeDifference is returned
instead of TreeRange (to accommodate the extra size information), and it is used generally
in place of it within this method tree where applicable; hash() and hashHelper() have also
been renamed to find() and findHelper(), with a new hash() implementation depending on find().
I'm sure there are other minutiae, but hopefully nothing too opaque. If you need any clarification,
feel free to ask.
                
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2698
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>              Labels: lhf
>         Attachments: nodetool_repair_and_cfhistogram.tar.gz, patch_2698_v1.txt, patch.diff
>
>
> Some reports indicate that repair sometime transfer huge amounts of data. One hypothesis
is that the merkle tree precision may deteriorate too much at some data size. To check this
hypothesis, it would be reasonably to gather statistic during the merkle tree building of
how many rows each merkle tree range account for (and the size that this represent). It is
probably an interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message