cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable
Date Mon, 13 Jan 2014 16:03:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869639#comment-13869639
] 

Jonathan Ellis commented on CASSANDRA-5263:
-------------------------------------------

You can estimate rows (partitions) in a range with the index sample.  SSTR.estimatedKeysForRanges
will do this for you.  (Until we have minhash or similar a la CASSANDRA-6474 you'll probably
want to assume worst-case, i.e. no overlap among the sstables.)

100MB isn't much in an 8GB heap.  I don't think we need to worry about that.

Is the tree building cpu bound or i/o bound?

> Allow Merkle tree maximum depth to be configurable
> --------------------------------------------------
>
>                 Key: CASSANDRA-5263
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Config
>    Affects Versions: 1.1.9
>            Reporter: Ahmed Bashir
>            Assignee: Minh Do
>
> Currently, the maximum depth allowed for Merkle trees is hardcoded as 15.  This value
should be configurable, just like phi_convict_treshold and other properties.
> Given a cluster with nodes responsible for a large number of row keys, Merkle tree comparisons
can result in a large amount of unnecessary row keys being streamed.
> Empirical testing indicates that reasonable changes to this depth (18, 20, etc) don't
affect the Merkle tree generation and differencing timings all that much, and they can significantly
reduce the amount of data being streamed during repair. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message