incubator-wave-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Gentle <>
Subject Re: Delving into OT
Date Mon, 24 Jun 2013 22:36:24 GMT
On Mon, Jun 24, 2013 at 2:30 PM, Ali Lown <> wrote:
>> 1. CRDTs (like Logoot[1])
>> 2. 'Classical' OT using vector clocks for versioning.
>> 3. The OT system [...] similar to classical OT, except using git style
>> version hashes.
> A quick look at Logoot's paper says that deletion is not generally
> possible with CRDTs (p5 section 4) without either a) Tombstones or b)
> vector clocks.
> So, I suspect it would be possible to make use of CRDTs (which seem
> the best approach) with the git-style hash to fully allow deletion.

As I understand it (good chance I'm wrong), using tombstones with
CRDTs requires leaving in canonical names per tombstone. One of the
nice aspects of normal OT is that tombstones don't need names. If you
have "hello ..... world", you can represent that as ["hello ", 5, "
world"]. That means that at worst the document has a tombstone (an
integer) between every two characters.

As I understand it, when they say "use vector clocks" they mean
falling back to doing transformation for deletes. If we need
transformation anyway, I don't know if there's much point using CDRTs.
I had a chat with one of the original etherpad guys about this a few
weeks ago. (David Greenspan, who also worked on wave for awhile and
now does Meteor). He was talking about occasionally trying to sync all
peers to purge junk crdt markers, and talking about how it defeated
the point. He generally wasn't too impressed with CRDTs.

And then there's the problem of how to edit a tree like JSON. As I
said, I don't know if anyone knows how to do that yet.

>> CRDTs are probably the most elegant solution. They avoid the entire
>> complicated mess of transformation. However:
>> - They add a significant overhead to the document size (we must store
>> a unique name between every two characters in the wave)
>> - As far as I know, nobody has implemented arbitrary JSON editing
>> using CRDTs. Implementing a set is very difficult.
>> - Even without transform, we'll still have to store operations anyway
>> so peers can verify signatures.
>> I'd like to know what we should expect the actual document size
>> overhead to be. I expect it to be at least a 5x size increase, but I
>> don't know for sure. The papers I've read cleverly minimize the
>> overhead by only doing OT on a line-by-line basis, but thats
>> unacceptable for wave.
> Really? Wave currently has special handling for <line></line> elements
> in the document structure. (IIRC for the sole purpose of making its OT
> easier - there was some reason to not put the document as
> <line>data</line> but <line></line>data).

Turns out, you can't split an element with OT. So if you have
<line>data</line>, you can't break the line into two lines. So they
use special <line /> elements as newline characters (that works

When I say a line-by-line basis, I mean, you can't edit inside lines.
With logoot, you can only insert and delete lines. To make that work
for wave we would have to constantly replace lines as we edit them,
and you couldn't have two people both editing the same line at the
same time.

Might work great for source code management though.

> Many systems (e.g. *nix) take the idea of a line quite seriously since
> they all favour text-based approaches.
> The protocol shouldn't really be handling arbitrary binary data(should
> it?), and JSON can easily be arbitrary newlined without a problem)

No, you can't edit json using the same algorithm you use for text.
Imagine if we have a list like this: [1]. I insert a second element
(insert ",2") and you delete the 1. Now the JSON contains [,2], which
is invalid.

The right answer is to use a json-aware OT algorithm which understands
lists and sets. Instead of inserting text into a json string, you use
special list-insert, object-insert, etc operations. But as I say, I
don't know if this is possible with CRDTs.

>> [...] a plaintext implementation which does this in javascript[3], although
>> its quite slow (~150k transforms per second). I expect that an
>> optimized version of this will be around 10-20 million transforms /
>> second in C[4] for simple ops, and probably a few million per second
>> in java.
> Nice numbers! (Heh, can we see some actual data :) )

My C implementation is stupid fast compared to JS. This is plaintext,
not TP2. This is how op apply speed changes as the document size
Log scale:
Linear scale:

For transform, well:

libot josephg$ ./test -b
Benchmarking transform...
run 0
dl 10000 did 200000000 iterations in 1976.206000 ms: 101.204024 Miter/sec
run 1
dl 10000 did 200000000 iterations in 2012.051000 ms: 99.401059 Miter/sec

... thats a zero-allocation version with single character edits. But
yeah, its a fair bit better than the 150k transforms/second that I'm
doing in javascript at the moment.

>> When peers connect, they send each other missing ops. Figuring out
>> which ops are missing can be surprisingly tricky - but we'll figure
>> that out later. New ops must be ingested in order, so we always ingest
>> an operation after ingesting all of its parents.
> Alarm bells start ringing here... Care to elaborate on how we could
> reasonably do this, it sounds difficult.

I hear that. This problem alone has been bothering me for a few weeks.

First, if one peer only has a subset of the operations of another
peer, its easy. They both send their frontiers to each other, then the
peer with more ops recursively collects every op back to the other
peer's frontier and sends those.

If they both have ops the other doesn't have, its harder. Note that it
doesn't matter if you receive an op you already have again - you can
figure that out easily because of the hashes & discard. And ops are
small anyway. The big problem is round-trips - especially on mobile.
I've brainstormed a few options:
1. They just batch ops & send groups of ~20k ops to each other until
neither peer is missing anything
2. Peers could probably use bloom filters to figure out which ops to
send. These are small (~1.2 *bits* per op) but they scale linearly
with the number of ops in the document.
3. We could probably do some thing where peers send the ID of the last
history elemtnt they have, the second last, the 4th last, 8th last,
16th last etc that they have. The other peer says what they're
missing, and they send everything from that op onwards.

>> This algorithm requires a lot of transforms, but I think most of them
>> are unavoidable because of the signing / trust issue.
> Is there some way to use GPG-style signing chains to avoid having to
> retransform the whole thing whilst maintaining trust?

The problem is that you need to be able to read the operations to
transform them. Either the server can read your ops (and hence your
document, and so can the NSA) or your server can't read your ops and
can't transform.

>> It also requires
>> that every operation (ie, every key press) has a unique ID. Thats
>> probably going to be a 16 byte hash or something; which seems like a
>> lot of extra bytes for typing one character.
> This is a client problem. There is not always a reason to make a
> single character an operation.

Agree. Actually, the algorithm has an advantage here compared to the
old OT system: if you compose ops together, you can still transform
against the composed result. (In the old system, there were weird edge
cases where that would break OT).

So if a document is at version 1000 and someone sends a bunch of ops
at version 2, you could compose all ops from 2-1000 together, then
transform the new ops by that composed blob and apply them there. That
-might- be faster than transforming all the ops against all 998 other
ops individually.


>> Blah. That was a lot of words. Thoughts?
> See above. :)
> Ali

View raw message