incubator-wave-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Gentle <jose...@gmail.com>
Subject Re: JSON editing and move conflicts
Date Sun, 07 Dec 2014 22:14:02 GMT
On Wed, Dec 3, 2014 at 1:46 AM, Alex North <alex@alexn.id.au> wrote:
> <blast from="the-past">
>
> Joseph's earlier comments suggest you already realise this, but Wave
> implements map structures as list-of-pairs, so conflicting operations end
> up both surviving at the protocol level, with a consistent order reflecting
> their server order, so that conflict resolution can be left to the
> application. It's a bit obscure because the list-of-pairs is embedded in
> the wave doc structure, which isn't immediately obvious as essentially
> nested lists. This storage/transport representation doesn't constrain the
> conceptual application models built on top of it, where the data may then
> be converted to a HashMap, JS object etc via application logic.
> Occasionally we used element attributes as a model if last-one-wins was
> right for the application.
>
> We thought lots about this for different cases and this reflects a general
> rule we discovered: you can't design protocol-level conflict resolution in
> a re-useable way to suit very many different applications. Something that
> works well for one application will confound another. Adding complexity at
> this level pretty much always turned out poorly, and where we needed a tie
> break it went in favour of document editing (e.g. cursors after concurrent
> insertion). Hence both lack of a map primitive and the lack of a move
> operation in favour of delete/insert.

Yeah, honestly I still feel like the wave data model is simultaneously
trying to do too much and too little. It tries a little bit to be all
things to all people (by letting you write your own OT code on top of
the data model), and the data model doesn't support some things you
actually want to do. I'm very happy with how a system built on the new
json2 stuff will come out - which is based on a lot of ideas from
discussions with people on this list.

My general plan is to have documents be a json tree, and at the leaves
of the tree you can embed arbitrary data with arbitrary OT semantics.
The tree will support insert/delete/move/insert-if-null operations on
maps and lists and everything else is at the leaves.

I semi-agree with you about reusable conflict resolution. I think
conflict resolution is very different in *some* different
applications. But for standard boring CRUD apps, a decent json OT type
(tree of lists, maps & strings) captures about 90% of what you want to
do. And that makes sense, because these apps have traditionally been
last-writer-wins anyway. I'd still like to have a good story around
references - if users start doing something like this:
{data:{hash1:{}, hash2:{}}, someref:"hash1"}, there's lots of gnarly
edge cases which can come up. And I think wave's logic-on-top might be
the right way to deal with that.

With a plain text and rich text ot types, I think this will cover most
applications. There's a semi-gross hack that the rich text type will
probably (recursively) need to be able to embed arbitrary JSON / other
OT type data too.

I wasn't there during those meetings - Alex can you remember any other
use cases? Is there anything this won't support?

We'll still need to do something a bit special if we want to build a
source control system using all of this, but thats manageable. (We
will have to think about how to make conflict-free editing (pair
programming) work with 'normal' scm merging which should generate
conflicts.)

-J

> P.S. Joseph, I hear your in Sydney. Look forward to seeing you.
>
> </blast>
>
> On Thu Oct 23 2014 at 6:08:38 AM Joseph Gentle <josephg@gmail.com> wrote:
>
>> On Wed, Oct 22, 2014 at 2:50 AM, Patrick Coleman <patcoleman@google.com>
>> wrote:
>> >>
>> >> - I'm not sure if I want to make this OT type invertable. If we don't,
>> >
>> > What are the costs of making it invertable? i.e. are there particular
>> > reasons not to?
>> > I guess for proper invertability, you'd need to copy all the data being
>> > moved?
>>
>> I honestly don't know what the complexity cost is. And I've wavered
>> back and forth on whether invertibility is worth it for the past few
>> years. The way I'm looking at designing operations at the moment is
>> something like this:
>>
>> Given a doc: {x:5, y:[1,2,3]}:
>> Pickup phase: go in key x, delete, skip to key y, go in key 1, pick up
>> (id=0)
>> Edit phase: go in key y.0. Add 10
>> Drop phase: go to key x, insert "hi", go in key z, drop id=0
>>
>> -> The document will be {x:"hi", y:[11,3], z:2}
>>
>> If this works out, inverting the operation *should be* really easy -
>> we just swap the pickup phase & drop phase, then invert everything in
>> the edit phase. We *don't* need a copy of moved data, we just need to
>> swap the source and destination. I'm thinking that if you explicitly
>> want to overwrite something, you'll need to delete it in the pickup
>> phase before you can move something over the top of it. Then the
>> semantics of conflicts-on-insert will match the semantics of
>> conflict-on-move. But that also means that only deletes would need to
>> specify what was deleted to make invertibility work.
>>
>> I don't know if this will actually work out in practice - I'm writing
>> the transform function now and I may end up changing the data model
>> somewhat.
>>
>> I also want a couple more properties:
>>
>> - I want to be able to embed subtypes for things like rich text - and
>> those subtypes aren't necessarily invertible themselves.
>> - I want to make a TP2 version of this JSON type, so we can make it
>> into a p2p federated wave-like structure at some point. That type will
>> not be invertible because of the requirements of tp2 semantics.
>>
>> So, I'm not sure.
>>
>> >> But we could implement this much more cleanly if move operations
>> >> specify a fallback location for conflicting data to get moved into. If
>> >> that fallback location was a list, we can guarantee we won't have any
>> >> cascaded conflicts. To resolve the conflict, the user can just move or
>> >> delete the conflicted object like normal.
>> >>
>> >
>> > Out of interest, is this something that conflicting 'set' operations
>> would
>> > want too?
>> > e.g. (Set 'c' {v: 1}) & (Set 'c' {v: 2}) seems to have similar loss
>> > semantics,
>> > should _recovery then be [{set: 'c', data: {v: 1}}]
>>
>> Yep absolutely.
>>
>> >> Some observations:
>> >> >> - This will almost never actually happen in real life.
>> >> >> - Even if we turn the operation will transform into a 'replace',
we
>> >> >> could add a flag to the operation to mark that it was unintentionally
>> >> >> overwritten, and then the process which is making the corresponding
>> >> >> changes on disk could see that and do some special
>> >> >> application-specific behaviour instead of actually deleting the
file.
>> >> >
>> >> >
>> >> > Indeed - your structure could instead be:
>> >> > { "name": "librope",
>> >> >   "package": {
>> >> >     "src": {
>> >> >       "rope.c": "somelonguniqueid",
>> >> >       "rope.h": "anotherfileid",
>> >> >   }},
>> >> >   data: {
>> >> >     "somelonguniqueid": { "text": "#include <rope.h>\n ..." },
>> >> >     "anotherfileid": { "text": "// This is a cool header"  },
>> >> >   }}
>> >> >
>> >> > Then deletion is some separate offline index/cleanup, and resolving
>> >> > conflicts is simpler.
>> >> > (although 'move' is less useful here as there are no child
>> properties).
>> >>
>> >> Yeah - thats definitely usable. It kind of moves the mess around your
>> >> plate a little. Its very similar to the idea of just not allowing
>> >> move-to-object operations at all, which would force every
>> >> dictionary-like structure to be managed using lists with manual
>> >> (custom!) dedup in the case of conflicts. Either way, application
>> >> writers would need to do some manual garbage collection.
>> >
>> >
>> > Yep, I was assuming everyone wants manual garbage collection anyway :)
>> > I realized though that moving folders messes this up a bit;
>> > either the folders are also files (possible, but then every N-deep folder
>> > traversal does N id lookups)
>> > or they are objects (e.g. "src": {"js": { "file1": id1, "file2": id2,
>> > ...}}) in which case you're back to the original problem.
>>
>> Yeah exactly. I think the merge conflict semantics I suggested above
>> are a decent fit for this problem, and will make things a lot easier.
>>
>> >> Does move distinguish between PUT vs. PATCH?
>> >> > I would assume PUT (x: {a: 1})  -> z and PUT (y: {b: 2}) -> z
would
>> end
>> >> up
>> >> > in z: {b:2}
>> >> > but if the second operation received was a PATCH operation, they
>> would be
>> >> > merged rather than replaced.
>> >> > For file system moves they'd probably be PUT, but their might be
>> >> instances
>> >> > where a PATCH is useful.
>> >>
>> >> I was just thinking of move as a PUT operation. When is PATCH useful?
>> >> In what use cases do you actually want two object structures to get
>> >> merged?
>> >>
>> >
>> > Only thing I could think of was batch file moves - i.e. moving a set of
>> > string-id'd files from one directory to another,
>> > you'd want to PATCH move {'f1.js': {...}, 'f2.js': {...}} from src ->
>> dest
>> > but then it's a bit weird, as you'd want PATCH on the top level and PUT
>> > underneath, and it'd probably be better off as a list of PUT moves.
>> > Otherwise, I couldn't think of one, unless a client wanted to get really
>> > abstract and e.g. batch a 'local edits' object which then get 'applied'
>> > by a PATCH move onto the persisted object, but that also sounds like
>> > something that could be done better in other ways.
>>
>> Yeah.... I'm just going to leave this out for now. If there is a
>> compelling use case, A special OT merge operation would be strictly
>> better than manually merging all the keys, but if we can't think of
>> any actual use for it, I'll save myself the trouble.
>>
>> -J
>>

On Wed, Dec 3, 2014 at 1:46 AM, Alex North <alex@alexn.id.au> wrote:
> <blast from="the-past">
>
> Joseph's earlier comments suggest you already realise this, but Wave
> implements map structures as list-of-pairs, so conflicting operations end
> up both surviving at the protocol level, with a consistent order reflecting
> their server order, so that conflict resolution can be left to the
> application. It's a bit obscure because the list-of-pairs is embedded in
> the wave doc structure, which isn't immediately obvious as essentially
> nested lists. This storage/transport representation doesn't constrain the
> conceptual application models built on top of it, where the data may then
> be converted to a HashMap, JS object etc via application logic.
> Occasionally we used element attributes as a model if last-one-wins was
> right for the application.
>
> We thought lots about this for different cases and this reflects a general
> rule we discovered: you can't design protocol-level conflict resolution in
> a re-useable way to suit very many different applications. Something that
> works well for one application will confound another. Adding complexity at
> this level pretty much always turned out poorly, and where we needed a tie
> break it went in favour of document editing (e.g. cursors after concurrent
> insertion). Hence both lack of a map primitive and the lack of a move
> operation in favour of delete/insert.
>
> P.S. Joseph, I hear your in Sydney. Look forward to seeing you.
>
> </blast>
>
> On Thu Oct 23 2014 at 6:08:38 AM Joseph Gentle <josephg@gmail.com> wrote:
>
>> On Wed, Oct 22, 2014 at 2:50 AM, Patrick Coleman <patcoleman@google.com>
>> wrote:
>> >>
>> >> - I'm not sure if I want to make this OT type invertable. If we don't,
>> >
>> > What are the costs of making it invertable? i.e. are there particular
>> > reasons not to?
>> > I guess for proper invertability, you'd need to copy all the data being
>> > moved?
>>
>> I honestly don't know what the complexity cost is. And I've wavered
>> back and forth on whether invertibility is worth it for the past few
>> years. The way I'm looking at designing operations at the moment is
>> something like this:
>>
>> Given a doc: {x:5, y:[1,2,3]}:
>> Pickup phase: go in key x, delete, skip to key y, go in key 1, pick up
>> (id=0)
>> Edit phase: go in key y.0. Add 10
>> Drop phase: go to key x, insert "hi", go in key z, drop id=0
>>
>> -> The document will be {x:"hi", y:[11,3], z:2}
>>
>> If this works out, inverting the operation *should be* really easy -
>> we just swap the pickup phase & drop phase, then invert everything in
>> the edit phase. We *don't* need a copy of moved data, we just need to
>> swap the source and destination. I'm thinking that if you explicitly
>> want to overwrite something, you'll need to delete it in the pickup
>> phase before you can move something over the top of it. Then the
>> semantics of conflicts-on-insert will match the semantics of
>> conflict-on-move. But that also means that only deletes would need to
>> specify what was deleted to make invertibility work.
>>
>> I don't know if this will actually work out in practice - I'm writing
>> the transform function now and I may end up changing the data model
>> somewhat.
>>
>> I also want a couple more properties:
>>
>> - I want to be able to embed subtypes for things like rich text - and
>> those subtypes aren't necessarily invertible themselves.
>> - I want to make a TP2 version of this JSON type, so we can make it
>> into a p2p federated wave-like structure at some point. That type will
>> not be invertible because of the requirements of tp2 semantics.
>>
>> So, I'm not sure.
>>
>> >> But we could implement this much more cleanly if move operations
>> >> specify a fallback location for conflicting data to get moved into. If
>> >> that fallback location was a list, we can guarantee we won't have any
>> >> cascaded conflicts. To resolve the conflict, the user can just move or
>> >> delete the conflicted object like normal.
>> >>
>> >
>> > Out of interest, is this something that conflicting 'set' operations
>> would
>> > want too?
>> > e.g. (Set 'c' {v: 1}) & (Set 'c' {v: 2}) seems to have similar loss
>> > semantics,
>> > should _recovery then be [{set: 'c', data: {v: 1}}]
>>
>> Yep absolutely.
>>
>> >> Some observations:
>> >> >> - This will almost never actually happen in real life.
>> >> >> - Even if we turn the operation will transform into a 'replace',
we
>> >> >> could add a flag to the operation to mark that it was unintentionally
>> >> >> overwritten, and then the process which is making the corresponding
>> >> >> changes on disk could see that and do some special
>> >> >> application-specific behaviour instead of actually deleting the
file.
>> >> >
>> >> >
>> >> > Indeed - your structure could instead be:
>> >> > { "name": "librope",
>> >> >   "package": {
>> >> >     "src": {
>> >> >       "rope.c": "somelonguniqueid",
>> >> >       "rope.h": "anotherfileid",
>> >> >   }},
>> >> >   data: {
>> >> >     "somelonguniqueid": { "text": "#include <rope.h>\n ..." },
>> >> >     "anotherfileid": { "text": "// This is a cool header"  },
>> >> >   }}
>> >> >
>> >> > Then deletion is some separate offline index/cleanup, and resolving
>> >> > conflicts is simpler.
>> >> > (although 'move' is less useful here as there are no child
>> properties).
>> >>
>> >> Yeah - thats definitely usable. It kind of moves the mess around your
>> >> plate a little. Its very similar to the idea of just not allowing
>> >> move-to-object operations at all, which would force every
>> >> dictionary-like structure to be managed using lists with manual
>> >> (custom!) dedup in the case of conflicts. Either way, application
>> >> writers would need to do some manual garbage collection.
>> >
>> >
>> > Yep, I was assuming everyone wants manual garbage collection anyway :)
>> > I realized though that moving folders messes this up a bit;
>> > either the folders are also files (possible, but then every N-deep folder
>> > traversal does N id lookups)
>> > or they are objects (e.g. "src": {"js": { "file1": id1, "file2": id2,
>> > ...}}) in which case you're back to the original problem.
>>
>> Yeah exactly. I think the merge conflict semantics I suggested above
>> are a decent fit for this problem, and will make things a lot easier.
>>
>> >> Does move distinguish between PUT vs. PATCH?
>> >> > I would assume PUT (x: {a: 1})  -> z and PUT (y: {b: 2}) -> z
would
>> end
>> >> up
>> >> > in z: {b:2}
>> >> > but if the second operation received was a PATCH operation, they
>> would be
>> >> > merged rather than replaced.
>> >> > For file system moves they'd probably be PUT, but their might be
>> >> instances
>> >> > where a PATCH is useful.
>> >>
>> >> I was just thinking of move as a PUT operation. When is PATCH useful?
>> >> In what use cases do you actually want two object structures to get
>> >> merged?
>> >>
>> >
>> > Only thing I could think of was batch file moves - i.e. moving a set of
>> > string-id'd files from one directory to another,
>> > you'd want to PATCH move {'f1.js': {...}, 'f2.js': {...}} from src ->
>> dest
>> > but then it's a bit weird, as you'd want PATCH on the top level and PUT
>> > underneath, and it'd probably be better off as a list of PUT moves.
>> > Otherwise, I couldn't think of one, unless a client wanted to get really
>> > abstract and e.g. batch a 'local edits' object which then get 'applied'
>> > by a PATCH move onto the persisted object, but that also sounds like
>> > something that could be done better in other ways.
>>
>> Yeah.... I'm just going to leave this out for now. If there is a
>> compelling use case, A special OT merge operation would be strictly
>> better than manually merging all the keys, but if we can't think of
>> any actual use for it, I'll save myself the trouble.
>>
>> -J
>>

Mime
View raw message