Ingenious, Torben, certainly adds efficiency. John
On Mon, Jul 1, 2013 at 4:38 AM, Torben Weis <torben.weis@gmail.com> wrote:
> 2013/6/25 Joseph Gentle <josephg@gmail.com>
>
> >
> > >> When peers connect, they send each other missing ops. Figuring out
> > >> which ops are missing can be surprisingly tricky  but we'll figure
> > >> that out later. New ops must be ingested in order, so we always ingest
> > >> an operation after ingesting all of its parents.
> >
> > Just use a Merkle Tree that is at the same time a prefix tree with
> respect
> to the hashes of the ops (explanation below).
> The bandwidth usage is O(1) if both clients are in sync and O(log n) if
> they have one or few different ops and O(n) in the worst case, where n in
> the number of ops.
>
> Constructing the tree is simple.
> Let the hash function output 20 bytes and let's encode this in hex. This
> results in a hashstring of 40 hexcharacters for each operation.
> Each node hashes over the hashes of its children. Leafnodes correspond to
> operations and thus use the hash value of their respective operation.
> The treeinvariant is that all siblings on level x share the same prefix of
> x hexcharacters.
> The tree is not sent over the network. Instead, clients start comparing the
> hashes at the root.
>
> Two clients compare their root hash. If it is equal, the entire tree is
> equal and therefore they are in sync.
> If not, they download all direct children and repeat the procedure for each
> subtree rooted by one of these children.
> For example, if child number 3 has a different hash, but all others share
> the same hash, then we have learned that there are one or more ops with a
> hash of 3xxxx... that are different and need syncing.
>
> Typically we can limit the depth of the tree to few levels. 8 levels
> already yield a tree that could store 16^8 possible ops. So in the worst
> case two clients need to wait for 8 roundtrips to determine a missing op.
>
> In addition, each client sends a time stamp. So when syncing we report the
> last time stamp received from this client and ask for all ops this client
> received later. If these are few, then simply get them (even if we know
> some of the ops already, because we got them from another client). If there
> are too many ops, fall back to the merkle tree. With a good approximation
> of RTT and bandwidth, it is easy to calculate which algorithm is the best
> to sync two clients.
>
> Greetings
> Torben
>
