incubator-wave-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torben Weis <torben.w...@gmail.com>
Subject Re: Delving into OT
Date Mon, 01 Jul 2013 08:38:32 GMT
2013/6/25 Joseph Gentle <josephg@gmail.com>

>
> >> When peers connect, they send each other missing ops. Figuring out
> >> which ops are missing can be surprisingly tricky - but we'll figure
> >> that out later. New ops must be ingested in order, so we always ingest
> >> an operation after ingesting all of its parents.
>
> Just use a Merkle Tree that is at the same time a prefix tree with respect
to the hashes of the ops (explanation below).
The bandwidth usage is O(1) if both clients are in sync and O(log n) if
they have one or few different ops and O(n) in the worst case, where n in
the number of ops.

Constructing the tree is simple.
Let the hash function output 20 bytes and let's encode this in hex. This
results in a hash-string of 40 hex-characters for each operation.
Each node hashes over the hashes of its children. Leaf-nodes correspond to
operations and thus use the hash value of their respective operation.
The tree-invariant is that all siblings on level x share the same prefix of
x hex-characters.
The tree is not sent over the network. Instead, clients start comparing the
hashes at the root.

Two clients compare their root hash. If it is equal, the entire tree is
equal and therefore they are in sync.
If not, they download all direct children and repeat the procedure for each
sub-tree rooted by one of these children.
For example, if child number 3 has a different hash, but all others share
the same hash, then we have learned that there are one or more ops with a
hash of 3xxxx... that are different and need syncing.

Typically we can limit the depth of the tree to few levels. 8 levels
already yield a tree that could store 16^8 possible ops. So in the worst
case two clients need to wait for 8 round-trips to determine a missing op.

In addition, each client sends a time stamp. So when syncing we report the
last time stamp received from this client and ask for all ops this client
received later. If these are few, then simply get them (even if we know
some of the ops already, because we got them from another client). If there
are too many ops, fall back to the merkle tree. With a good approximation
of RTT and bandwidth, it is easy to calculate which algorithm is the best
to sync two clients.

Greetings
Torben

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message