Always use even distribution for merkle tree with RandomPartitionner
--------------------------------------------------------------------
Key: CASSANDRA-2841
URL: https://issues.apache.org/jira/browse/CASSANDRA-2841
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.7.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
Fix For: 0.7.7, 0.8.2
Attachments: 2841.patch
When creating the initial merkle tree, repair tries to be (too) smart and use the key samples
to "guide" the tree splitting. While this is a good idea for OPP where there is a good change
the data distribution is uneven, you can't beat an even distribution for the RandomPartitionner.
And a quick experiment even shows that the method used is significantly less efficient than
an even distribution for the ranges of the merkle tree (that is, an even distribution gives
a much better of distribution of the number of keys by range of the tree).
Thus let's switch to an even distribution for RandomPartitionner. That 3 lines change alone
amounts for a significant improvement of repair's precision.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
|