cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)
Date Thu, 11 Apr 2013 21:59:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629448#comment-13629448
] 

Yuki Morishita commented on CASSANDRA-2698:
-------------------------------------------

Benedict,

bq. Since I need to reinsert all the records once I've decided this anyway, I need to retain
them all, which I chose to do in EstimatedHistogram as they do, ...

If you do that, you should create your own class with labels and array since you're not using
default offsets nor other histogram related methods. It confused me at first why you are doing
addToIndex to EstimatedHistogram.

But looking at this from the begining again, what we want to see is if we have Merkle tree
of evenly distributed keys(or rows) in each hash. You can use EstimatedHistogram or your own
to show that. For now, just use logger to log that distribution at the end of Merkle Tree
creation with corresponding repair session Id is fine, instead of sending stats back to the
coordinator.

For the streaming part, it is hard to distinguish which stream session belongs to certain
repair on current code base(we can only see if it is repair related or not by looking at OperationType).
So we need to improve that, and I'm working on as part of repair and streaming redesign(CASSANDRA-5426,
CASSANDRA-5286). So, let's focus on the former, validation part.

bq. Do you have an Eclipse formatter profile I could use for your coding convention?

Sorry, I use intellij, but I think someone on #cassandra-dev on irc has one.
                
> Instrument repair to be able to assess it's efficiency (precision)
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2698
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Benedict
>            Priority: Minor
>              Labels: lhf
>         Attachments: nodetool_repair_and_cfhistogram.tar.gz, patch_2698_v1.txt, patch.diff,
patch-rebased.diff
>
>
> Some reports indicate that repair sometime transfer huge amounts of data. One hypothesis
is that the merkle tree precision may deteriorate too much at some data size. To check this
hypothesis, it would be reasonably to gather statistic during the merkle tree building of
how many rows each merkle tree range account for (and the size that this represent). It is
probably an interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message