couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: # [DISCUSS] : things we need to solve/decide : storage of edit conflicts
Date Thu, 07 Feb 2019 21:52:54 GMT
I think we should choose simple. We can then see if performance is too low or storage overhead
too high and then see what we can do about it.

B.

-- 
  Robert Samuel Newson
  rnewson@apache.org

On Thu, 7 Feb 2019, at 20:36, Ilya Khlopotov wrote:
> We cannot do simple thing if we want to support sharing of JSON terms. I 
> think if we want the simplest path we should move sharing out of the 
> scope. The problem with sharing is we need to know the location of 
> shared terms when we do write. This means that we have to read full 
> document on every write. There might be tricks to replace full document 
> read with some sort of hierarchical signature or sketch of a document. 
> However these tricks do not fall into simplest solution category. We 
> need to choose the design goals:
> - simple
> - performance
> - reduced storage overhead
> 
> best regards,
> iilyak
> 
> On 2019/02/07 12:45:34, Garren Smith <garren@apache.org> wrote: 
> > I’m also in favor of keeping it really simple and then testing and
> > measuring it.
> > 
> > What is the best way to measure that we have something that works? I’m not
> > sure just relying on our current tests will prove that? Should we define
> > and build some more complex situations e.g docs with lots of conflicts or
> > docs with wide revisions and make sure we can solve for those?
> > 
> > On Thu, Feb 7, 2019 at 12:33 PM Jan Lehnardt <jan@apache.org> wrote:
> > 
> > > I’m also very much in favour with starting with the simplest thing that
> > > can possibly work and doesn’t go against the advertised best practices of
> > > FoundationDB. Let’s get that going and get a feel for how it all works
> > > together, before trying to optimise things we can’t measure yet.
> > >
> > > Best
> > > Jan
> > > —
> > >
> > > > On 6. Feb 2019, at 16:58, Robert Samuel Newson <rnewson@apache.org>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > With the Redwood storage engine under development and with prefix
> > > elision part of its design, I don’t think we should get too hung up on
> > > adding complications and indirections in the key space just yet. We haven’t
> > > written a line of code or run any tests, this is premature optimisation.
> > > >
> > > > I’d like to focus on the simplest solution that yields all required
> > > properties. We can embellish later (if warranted).
> > > >
> > > > I am intrigued by all the ideas that might allow us cheaper inserts and
> > > updates than the current code where there are multiple edit branches in the
> > > stored document.
> > > >
> > > > B.
> > > >
> > > >> On 6 Feb 2019, at 02:18, Ilya Khlopotov <iilyak@apache.org>
wrote:
> > > >>
> > > >> While reading Adam's proposal I came to realize that: we don't have
to
> > > calculate winning revision at read time.
> > > >> Since FDB's transactions are atomic we can calculate it when we write.
> > > This means we can just write latest values into separate range. This makes
> > > lookup of latest version fast.
> > > >> Another realization is if we want to share values for some json paths
> > > we would have to introduce a level of indirection.
> > > >> Bellow is the data model inspired by Adam's idea to share json_paths.
> > > In this model the json_path is stored in the revision where it was first
> > > added (we call that revision an owner of a json_path). The values for
> > > json_path key can be scalar values, parts of scalar values or pointers to
> > > owner location.
> > > >> The below snippets are sketches of transactions.
> > > >> The transactions will include updates to other keys as needed
> > > (`external_size`, `by_seq` and so on).  The revision tree management is not
> > > covered yet.
> > > >> The `rev -> vsn` indirection is not strictly required. It is added
> > > because it saves some space since `rev` is a long string and `vsn` is FDB
> > > versionstamp of fixed size.
> > > >>
> > > >> - `{NS} / {docid} / _by_rev / {rev} = vsn`
> > > >> - `{NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL`
> > > >> - `{NS} / {docid} / _data / {json_path} = latest_value | part`
> > > >> - `{NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> > > {another_vsn}`
> > > >>
> > > >> ```
> > > >> write(txn, doc_id, prev_rev, json):
> > > >>  txn.add_write_conflict_key("{NS} / {doc_id} / _rev")
> > > >>  rev = generate_new_rev()
> > > >>  txn["{NS} / {docid} / _by_rev / {rev}"] = vsn
> > > >>  for every json_path in flattened json
> > > >>    - {NS} / {docid} / _used_by / {json_path} / {another_vsn} = NIL
> > > >>    if rev is HEAD:
> > > >>      # this range contains values for all json paths for the latest
> > > revision (read optimization)
> > > >>      - {NS} / {docid} / _data / {json_path} = latest_value | part
> > > >>    - {NS} / {docid} / {vsn} / _data / {json_path} = value | part |
> > > {another_vsn}
> > > >>  txn["{NS} / {doc_id} / _rev"] = rev
> > > >>
> > > >> get_current(txn, doc_id):
> > > >> # there is no sharing of json_paths in this range (read optimization)
> > > >> txn.get_range("{NS} / {docid} / _data / 0x00", "{NS} / {docid} / _data
> > > / 0xFF" )
> > > >>
> > > >> get_revision(txn, doc_id, rev):
> > > >> vsn = txn["{NS} / {docid} / _by_rev / {rev}"]
> > > >> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00",
> > > "{NS} / {vsn} / {docid} / _data / 0xFF" )
> > > >> for every json_path in json_paths:
> > > >>   if value has type vsn:
> > > >>      another_vsn = value
> > > >>         value = txn["{NS} / {docid} / {another_vsn} / _data /
> > > {json_path}"]
> > > >>   result[json_path] = value
> > > >>
> > > >> delete_revision(txn, doc_id, rev):
> > > >> vsn = txn["{NS} / {docid} / _by_rev / {rev}"]
> > > >> json_paths = txn.get_range("{NS} / {vsn} / {docid} / _data / 0x00",
> > > "{NS} / {vsn} / {docid} / _data / 0xFF" )
> > > >> for every json_path in json_paths:
> > > >>   if value has type vsn:
> > > >>     # remove reference to deleted revision from the owner
> > > >>      del txn[{NS} / {docid} / _used_by / {json_path} / {vsn}]
> > > >>   # check if deleted revision of json_path is not used by anything
else
> > > >>   if txn.get_range("{NS} / {docid} / _used_by / {json_path} / {vsn}",
> > > limit=1) == []:
> > > >>      del txn["{NS} / {docid} / {vsn} / _data / {json_path}"]
> > > >>   if vsn is HEAD:
> > > >>      copy range for winning revision into "{NS} / {docid} / _data
/
> > > {json_path}"
> > > >> ```
> > > >>
> > > >> best regards,
> > > >> iilyak
> > > >>
> > > >> On 2019/02/04 23:22:09, Adam Kocoloski <kocolosk@apache.org>
wrote:
> > > >>> I think it’s fine to start a focused discussion here as it might
help
> > > inform some of the broader debate over in that thread.
> > > >>>
> > > >>> As a reminder, today CouchDB writes the entire body of each document
> > > revision on disk as a separate blob. Edit conflicts that have common fields
> > > between them do not share any storage on disk. The revision tree is encoded
> > > into a compact format and a copy of it is stored directly in both the by_id
> > > tree and the by_seq tree. Each leaf entry in the revision tree contain a
> > > pointer to the position of the associated doc revision on disk.
> > > >>>
> > > >>> As a further reminder, CouchDB 2.x clusters can generate edit
conflict
> > > revisions just from multiple clients concurrently updating the same
> > > document in a single cluster. This won’t happen when FoundationDB is
> > > running under the hood, but users who deploy multiple CouchDB or PouchDB
> > > servers and replicate between them can of course still produce conflicts
> > > just like they could in CouchDB 1.x, so we need a solution.
> > > >>>
> > > >>> Let’s consider the two sub-topics separately: 1) storage of
edit
> > > conflict bodies and 2) revision trees
> > > >>>
> > > >>> ## Edit Conflict Storage
> > > >>>
> > > >>> The simplest possible solution would be to store each document
> > > revision separately, like we do today. We could store document bodies with
> > > (“docid”, “revid”) as the key prefix, and each transaction could clear
the
> > > key range associated with the base revision against which the edit is being
> > > attempted. This would work, but I think we can try to be a bit more clever
> > > and save on storage space given that we’re splitting JSON documents into
> > > multiple KV pairs.
> > > >>>
> > > >>> One thought I’d had is to introduce a special enum Value which
> > > indicates that the subtree “beneath” the given Key is in conflict. For
> > > example, consider the documents
> > > >>>
> > > >>> {
> > > >>>   “_id”: “foo”,
> > > >>>   “_rev”: “1-abc”,
> > > >>>   “owner”: “alice”,
> > > >>>   “active”: true
> > > >>> }
> > > >>>
> > > >>> and
> > > >>>
> > > >>> {
> > > >>>   “_id”: “foo”,
> > > >>>   “_rev”: “1-def”,
> > > >>>   “owner”: “bob”,
> > > >>>   “active”: true
> > > >>> }
> > > >>>
> > > >>> We could represent these using the following set of KVs:
> > > >>>
> > > >>> (“foo”, “active”) = true
> > > >>> (“foo”, “owner”) = kCONFLICT
> > > >>> (“foo”, “owner”, “1-abc”) = “alice”
> > > >>> (“foo”, “owner”, “1-def”) = “bob”
> > > >>>
> > > >>> This approach also extends to conflicts where the two versions
have
> > > different data types. Consider a more complicated example where bob dropped
> > > the “active” field and changed the “owner” field to an object:
> > > >>>
> > > >>> {
> > > >>> “_id”: “foo”,
> > > >>> “_rev”: “1-def”,
> > > >>> “owner”: {
> > > >>>   “name”: “bob”,
> > > >>>   “email”: “bob@example.com"
> > > >>> }
> > > >>> }
> > > >>>
> > > >>> Now the set of KVs for “foo” looks like this (note that a
missing
> > > field needs to be handled explicitly):
> > > >>>
> > > >>> (“foo”, “active”) = kCONFLICT
> > > >>> (“foo”, “active”, “1-abc”) = true
> > > >>> (“foo”, “active”, “1-def”) = kMISSING
> > > >>> (“foo”, “owner”) = kCONFLICT
> > > >>> (“foo”, “owner”, “1-abc”) = “alice”
> > > >>> (“foo”, “owner”, “1-def”, “name”) = “bob”
> > > >>> (“foo”, “owner”, “1-def”, “email”) = “bob@example.com”
> > > >>>
> > > >>> I like this approach for the common case where documents share
most of
> > > their data in common but have a conflict in a very specific field or set of
> > > fields.
> > > >>>
> > > >>> I’ve encountered one important downside, though: an edit that
> > > replicates in and conflicts with the entire document can cause a bit of a
> > > data explosion. Consider a case where I have 10 conflicting versions of a
> > > 100KB document, but the conflicts are all related to a single scalar value.
> > > Now I replicate in an empty document, and suddenly I have a kCONFLICT at
> > > the root. In this model I now need to list out every path of every one of
> > > the 10 existing revisions and I end up with a 1MB update. Yuck. That’s
> > > technically no worse in the end state than the “zero sharing” case above,
> > > but one could easily imagine overrunning the transaction size limit this
> > > way.
> > > >>>
> > > >>> I suspect there’s a smart path out of this. Maybe the system
detects a
> > > “default” value for each field and uses that instead of writing out the
> > > value for every revision in a conflicted subtree. Worth some discussion.
> > > >>>
> > > >>> ## Revision Trees
> > > >>>
> > > >>> In CouchDB we currently represent revisions as a hash history
tree;
> > > each revision identifier is derived from the content of the revision
> > > including the revision identifier of its parent. Individual edit branches
> > > are bounded in *length* (I believe the default is 1000 entries), but the
> > > number of edit branches is technically unbounded.
> > > >>>
> > > >>> The size limits in FoundationDB preclude us from storing the entire
> > > key tree as a single value; in pathological situations the tree could
> > > exceed 100KB. Rather, I think it would make sense to store each edit
> > > *branch* as a separate KV. We stem the branch long before it hits the value
> > > size limit, and in the happy case of no edit conflicts this means we store
> > > the edit history metadata in a single KV. It also means that we can apply
> > > an interactive edit without retrieving the entire conflicted revision tree;
> > > we need only retrieve and modify the single branch against which the edit
> > > is being applied. The downside is that we duplicate historical revision
> > > identifiers shared by multiple edit branches, but I think this is a
> > > worthwhile tradeoff.
> > > >>>
> > > >>> I would furthermore try to structure the keys so that it is possible
> > > to retrieve the “winning” revision in a single limit=1 range query. Ideally
> > > I’d like to proide the following properties:
> > > >>>
> > > >>> 1) a document read does not need to retrieve the revision tree
at all,
> > > just the winning revision identifier (which would be stored with the rest
> > > of the doc)
> > > >>> 2) a document update only needs to read the edit branch of the
> > > revision tree against which the update is being applied, and it can read
> > > that branch immediately knowing only the content of the edit that is being
> > > attempted (i.e., it does not need to read the current version of the
> > > document itself).
> > > >>>
> > > >>> So, I’d propose a separate subspace (maybe “_meta”?) for
the revision
> > > trees, with keys and values that look like
> > > >>>
> > > >>> (“_meta”, DocID, IsDeleted, RevPosition, RevHash) = [ParentRev,
> > > GrandparentRev, …]
> > > >>>
> > > >>> The inclusion of IsDeleted, RevPosition and RevHash in the key
should
> > > be sufficient (with the right encoding) to create a range query that
> > > automatically selects the “winner” according to CouchDB’s arcane rules,
> > > which are something like
> > > >>>
> > > >>> 1) deleted=false beats deleted=true
> > > >>> 2) longer paths (i.e. higher RevPosition) beat shorter ones
> > > >>> 3) RevHashes with larger binary values beat ones with smaller
values
> > > >>>
> > > >>> ===========
> > > >>>
> > > >>> OK, that’s all on this topic from me for now. I think this is
a
> > > particularly exciting area where we start to see the dividends of splitting
> > > up data into multiple KV pairs in FoundationDB :) Cheers,
> > > >>>
> > > >>> Adam
> > > >>>
> > > >>>
> > > >>>> On Feb 4, 2019, at 2:41 PM, Robert Newson <rnewson@apache.org>
wrote:
> > > >>>>
> > > >>>> This one is quite tightly coupled to the other thread on data
model,
> > > should we start much conversation here before that one gets closer to a
> > > solution?
> > > >>>>
> > > >>>> --
> > > >>>> Robert Samuel Newson
> > > >>>> rnewson@apache.org
> > > >>>>
> > > >>>> On Mon, 4 Feb 2019, at 19:25, Ilya Khlopotov wrote:
> > > >>>>> This is a beginning of a discussion thread about storage
of edit
> > > >>>>> conflicts and everything which relates to revisions.
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>>
> > > >
> > >
> > > --
> > > Professional Support for Apache CouchDB:
> > > https://neighbourhood.ie/couchdb-support/
> > >
> > >
> > 

Mime
View raw message