cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)
Date Fri, 06 Jul 2012 21:06:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408329#comment-13408329
] 

Jonathan Ellis commented on CASSANDRA-2698:
-------------------------------------------

Following the comments at the top, you want two things here:

# A histogram of TreeRange row counts
# for each pair of merkle tree, the number of ranges that differs and the corresponding streamed
size of the data 

1. is easy: add an EstimatedHistogram to the MerkleTree class, and when the ranges are finished
computing, you'd iterate over each and add its row count to the histogram
2. is a bit more involved: you want to extend the logging done by Differencer to include the
given information, which is going to involve poking into the guts of (probably) MerkleTree.difference
and performStreamingRepair.

I agree though that repair is an intimidating part of the code base.  If you want to start
with something simpler, that's fine too.
                
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2698
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>              Labels: lhf
>         Attachments: nodetool_repair_and_cfhistogram.tar.gz
>
>
> Some reports indicate that repair sometime transfer huge amounts of data. One hypothesis
is that the merkle tree precision may deteriorate too much at some data size. To check this
hypothesis, it would be reasonably to gather statistic during the merkle tree building of
how many rows each merkle tree range account for (and the size that this represent). It is
probably an interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message