cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Lemmen <>
Subject merkle anti-entropy and balancing of trees
Date Tue, 04 Oct 2016 16:21:05 GMT
hi everyone,

I am trying to understand anti-entropy measures in general and 
specifically the ones used in cassandra. The more I look at it the more
I get confused, so I would be glad if anyone could shed some light on
the following question, or point me in the right direction:

on the one hand a search tree needs to be reasonably balanced for it to
remain efficient in lookup and insertion. On the other hand, for a
merkle tree we want the trees that are being compared to be as similar
as possible, where the underlying data allows. If we for example assume
the two sides to contain values [1,2,3] and [0,1,2,3] respectively,
and assume this tree for the "left" side:

   / \
  B   3
 / \
1   2 

a balanced tree for the rigth hand side would look like this:

   /     \
  D       E    
 / \     / \
0   1   2   3

this is nice for search/insert, but means that syncing the two trees
requires a full sync of all nodes (to make right := left), as all
internal node hashes are different.

an unbalanced tree could, if unbalanced just the right way, look like

     / \
    b   3
   / \
  X   2 
 / \
0   1

in this case the sync would quickly converge on "0" being the only
different leaf. In this example the actual number of messages exchanged
is not that different, because there are so few inner nodes and all of
them lie on the path to "0". but if you extend the example to a larger
tree then the difference becomes quite dramatic.

One obvious way around this would be to not use a balanced tree, but one
where each item is in a more predictable position in the tree. in the
example above (integer keys), you could for example use the least
significant bit at the root to distinguish whether items are "left" or
"right" of the node, and the next bit on the level below:

   /   \
  g     h
 / \   / \
0   2 1   3

a node with only one child would collapse:

   /   \
  2     h
       / \
      1   3

which would preserve the structure of parts of the tree that are similar
between the two (h and below here). Obviously the unbalanced nature is a
problem, as is the fact that you can't do range lookups anymore. But at
the same time the full-sync on single insert sounds quite pathological
as well...

So I am wondering: is this a problem? how does cassandra deal with it?
does it actually use fully balanced trees? Is there any literature on

thanks  robert

Robert Lemmen                      

View raw message