couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: [DISCUSS] : things we need to solve/decide : storing JSON documents
Date Wed, 20 Feb 2019 00:39:55 GMT
Assuming we have multiple competing implementations performance
characterization of each is going to have to be a requirement, right?

Theoretically this would all be at the HTTP layer such that we can swap
things in and out and easily check total system performance and not
duplicate effort and get distracted by pineapple vs recliner comparisons.

The theory being that we’d focus on observable differences weighed against
implementation complexity against feature “ability” in terms of what
various designs might offer.

I’ve been working on Erlang bindings for FoundationDB for a bit and would
really like to bring their approach to testing up through the rest of
CouchDB. Its very reminiscent of property based testing though slightly
less formal.  But it certainly finds bugs. Anything we do regardless of
peromance  I think should be accompanied by a similarly thoroughbtest suite.


On Tue, Feb 19, 2019 at 5:45 PM Joan Touzet <wohali@apache.org> wrote:

> Would it be too much work to prototype both and check CRUD timings for
> each across a small variety of documents?
>
> -Joan
>
> ----- Original Message -----
> > From: "Paul Davis" <paul.joseph.davis@gmail.com>
> > To: dev@couchdb.apache.org
> > Sent: Tuesday, February 19, 2019 5:41:23 PM
> > Subject: Re: [DISCUSS] : things we need to solve/decide : storing JSON
> documents
> >
> > A simple doc storage version number would likely be enough for future
> > us to
> > do fancier things.
> >
> > On Tue, Feb 19, 2019 at 4:16 PM Benjamin Anderson
> > <banjiewen@apache.org>
> > wrote:
> >
> > > > I don’t think adding a layer of abstraction is the right move
> > > > just yet,
> > > I think we should continue to find consensus on one answer to this
> > > question
> > >
> > > Agree that the theorycrafting stage is not optimal for making
> > > abstraction decisions, but I suspect it would be worthwhile
> > > somewhere
> > > between prototyping and releasing. Adam's proposal does seem to me
> > > the
> > > most appealing approach on the surface, and I don't see anyone
> > > signing
> > > up to do the work to deliver an alternative concurrently.
> > >
> > > --
> > > ba
> > >
> > > On Tue, Feb 19, 2019 at 1:43 PM Robert Samuel Newson
> > > <rnewson@apache.org>
> > > wrote:
> > > >
> > > > Addendum: By “directory aliasing” I meant within a document
> > > > (either the
> > > actual Directory thing or something equivalent of our own making).
> > > The
> > > directory aliasing for each database is a good way to reduce key
> > > size
> > > without a significant cost. Though if Redwood lands in time, even
> > > this
> > > would become an inutile obfuscation].
> > > >
> > > > > On 19 Feb 2019, at 21:39, Robert Samuel Newson
> > > > > <rnewson@apache.org>
> > > wrote:
> > > > >
> > > > > Interesting suggestion, obviously the details might get the
> > > > > wrong kind
> > > of fun.
> > > > >
> > > > > Somewhere above I suggested this would be something we could
> > > > > change
> > > over time and even use different approaches for different documents
> > > within
> > > the same database. This is the long way of saying there are
> > > multiple ways
> > > to do this each with advantages and none without disadvantages.
> > > > >
> > > > > I don’t think adding a layer of abstraction is the right move
> > > > > just
> > > yet, I think we should continue to find consensus on one answer to
> > > this
> > > question (and the related ones in other threads) for the first
> > > release.
> > > It’s easy to say “we can change it later”, of course. We can,
> > > though it
> > > would be a chunk of work in the context of something that already
> > > works,
> > > I’ve rarely seen anyone sign up for that.
> > > > >
> > > > > I’m fine with the first proposal from Adam, where the keys are
> > > > > tuples
> > > of key parts pointing at terminal values. To make it easier for the
> > > first
> > > version, I would exclude optimisations like deduplication or the
> > > Directory
> > > aliasing or the schema thing that I suggested and that Ilya
> > > incorporated a
> > > variant of in a follow-up post. We’d accept that there are limits
> > > on the
> > > sizes of documents, including the awkward-to-express one about
> > > property
> > > depth.
> > > > >
> > > > > Stepping back, I’m not seeing any essential improvement over
> > > > > Adam’s
> > > original proposal besides the few corrections and clarifications
> > > made by
> > > various authors. Could we start an RFC based on Adam’s original
> > > proposal on
> > > document body, revision tree and index storage? We could then have
> > > PR’s
> > > against that for each additional optimisation (one person’s
> > > optimisation is
> > > another person’s needless complication)?
> > > > >
> > > > > If I’ve missed some genuine advance on the original proposal in
> > > > > this
> > > long thread, please call it out for me.
> > > > >
> > > > > B.
> > > > >
> > > > >> On 19 Feb 2019, at 21:15, Benjamin Anderson
> > > > >> <banjiewen@apache.org>
> > > wrote:
> > > > >>
> > > > >> As is evident by the length of this thread, there's a pretty
> > > > >> big
> > > > >> design space to cover here, and it seems unlikely we'll have
> > > > >> arrived
> > > > >> at a "correct" solution even by the time this thing ships.
> > > > >> Perhaps it
> > > > >> would be worthwhile to treat the in-FDB representation of data
> > > > >> as a
> > > > >> first-class abstraction and support multiple representations
> > > > >> simultaneously?
> > > > >>
> > > > >> Obviously there's no such thing as a zero-cost abstraction -
> > > > >> and I've
> > > > >> not thought very hard about how far up the stack the document
> > > > >> representation would need to leak - but supporting different
> > > > >> layouts
> > > > >> (primarily, as Adam points out, on the document body itself)
> > > > >> might
> > > > >> prove interesting and useful. I'm sure there are folks
> > > > >> interested in a
> > > > >> column-shaped CouchDB, for example.
> > > > >>
> > > > >> --
> > > > >> b
> > > > >>
> > > > >> On Tue, Feb 19, 2019 at 11:39 AM Robert Newson
> > > > >> <rnewson@apache.org>
> > > wrote:
> > > > >>>
> > > > >>> Good points on revtree, I agree with you we should store that
> > > intelligently to gain the benefits you mentioned.
> > > > >>>
> > > > >>> --
> > > > >>> Robert Samuel Newson
> > > > >>> rnewson@apache.org
> > > > >>>
> > > > >>> On Tue, 19 Feb 2019, at 18:41, Adam Kocoloski wrote:
> > > > >>>> I do not think we should store the revtree as a blob. The
> > > > >>>> design
> > > where
> > > > >>>> each edit branch is its own KV should save on network IO and
> > > > >>>> CPU
> > > cycles
> > > > >>>> for normal updates. We’ve performed too many heroics to keep
> > > > >>>> couch_key_tree from stalling entire databases when trying to
> > > > >>>> update
> > > a
> > > > >>>> single document with a wide revision tree, I would much
> > > > >>>> prefer to
> > > ignore
> > > > >>>> other edit branches entirely when all we’re doing is
> > > > >>>> extending one
> > > of
> > > > >>>> them.
> > > > >>>>
> > > > >>>> I also do not think we should store JSON documents as blobs,
> > > > >>>> but
> > > it’s a
> > > > >>>> closer call. Some of my reasoning for preferring the
> > > > >>>> exploded path
> > > > >>>> design:
> > > > >>>>
> > > > >>>> - it lends itself nicely to sub-document operations, for
> > > > >>>> which Jan
> > > > >>>> crafted an RFC last year:
> > > https://github.com/apache/couchdb/issues/1559
> > > > >>>> - it optimizes the creation of Mango indexes on existing
> > > > >>>> databases
> > > since
> > > > >>>> we only need to retrieve the value(s) we want to index
> > > > >>>> - it optimizes Mango queries that use field selectors
> > > > >>>> - anyone who wanted to try their hand at GraphQL will find
> > > > >>>> it very
> > > > >>>> handy: https://github.com/apache/couchdb/issues/1499
> > > > >>>> - looking further ahead, it lets us play with smarter leaf
> > > > >>>> value
> > > types
> > > > >>>> like Counters (yes I’m still on the CRDT bandwagon, sorry)
> > > > >>>>
> > > > >>>> A few comments on the thread:
> > > > >>>>
> > > > >>>>>>> * Most documents bodies are probably going to be smaller
> > > > >>>>>>> than
> > > 100k. So in
> > > > >>>>>>> the majority of case it would be one write / one read to
> > > > >>>>>>> update
> > > and fetch
> > > > >>>>>>> the document body.
> > > > >>>>
> > > > >>>> We should test, but I expect reading 50KB of data in a range
> > > > >>>> query
> > > is
> > > > >>>> almost as efficient as reading a single 50 KB value.
> > > > >>>> Similarly,
> > > writes
> > > > >>>> to a contiguous set of keys should be quite efficient.
> > > > >>>>
> > > > >>>> I am concerned about the overhead of the repeated field
> > > > >>>> paths in the
> > > > >>>> keys with the exploded path option in the absence of key
> > > > >>>> prefix
> > > > >>>> compression. That would be my main reason to acquiesce and
> > > > >>>> throw
> > > away
> > > > >>>> all the document structure.
> > > > >>>>
> > > > >>>> Adam
> > > > >>>>
> > > > >>>>> On Feb 19, 2019, at 12:04 PM, Robert Newson
> > > > >>>>> <rnewson@apache.org>
> > > wrote:
> > > > >>>>>
> > > > >>>>> I like the idea that we'd reuse the same pattern (but
> > > > >>>>> perhaps not
> > > the same _code_) for doc bodies, revtree and attachments.
> > > > >>>>>
> > > > >>>>> I hope we still get to delete couch_key_tree.erl, though.
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>> Robert Samuel Newson
> > > > >>>>> rnewson@apache.org
> > > > >>>>>
> > > > >>>>> On Tue, 19 Feb 2019, at 17:03, Jan Lehnardt wrote:
> > > > >>>>>> I like the idea from a “trying a simple thing first”
> > > > >>>>>> perspective,
> > > but
> > > > >>>>>> Nick’s points below are especially convincing to with this
> > > > >>>>>> for
> > > now.
> > > > >>>>>>
> > > > >>>>>> Best
> > > > >>>>>> Jan
> > > > >>>>>> —
> > > > >>>>>>
> > > > >>>>>>> On 19. Feb 2019, at 17:53, Nick Vatamaniuc
> > > > >>>>>>> <vatamane@gmail.com>
> > > wrote:
> > > > >>>>>>>
> > > > >>>>>>> Hi,
> > > > >>>>>>>
> > > > >>>>>>> Sorry for jumping in so late, I was following from the
> > > > >>>>>>> sidelines
> > > mostly. A
> > > > >>>>>>> lot of good discussion happening and am excited about the
> > > possibilities
> > > > >>>>>>> here.
> > > > >>>>>>>
> > > > >>>>>>> I do like the simpler "chunking" approach for a few
> > > > >>>>>>> reasons:
> > > > >>>>>>>
> > > > >>>>>>> * Most documents bodies are probably going to be smaller
> > > > >>>>>>> than
> > > 100k. So in
> > > > >>>>>>> the majority of case it would be one write / one read to
> > > > >>>>>>> update
> > > and fetch
> > > > >>>>>>> the document body.
> > > > >>>>>>>
> > > > >>>>>>> * We could reuse the chunking code for attachment
> > > > >>>>>>> handling and
> > > possibly
> > > > >>>>>>> revision key trees. So it's the general pattern of upload
> > > > >>>>>>> chunks
> > > to some
> > > > >>>>>>> prefix, and when finished flip an atomic toggle to make
> > > > >>>>>>> it
> > > current.
> > > > >>>>>>>
> > > > >>>>>>> * Do the same thing with revision trees and we could
> > > > >>>>>>> re-use the
> > > revision
> > > > >>>>>>> tree manipulation logic. That is, the key tree in most
> > > > >>>>>>> cases
> > > would be small
> > > > >>>>>>> enough to fit in 100k but if they get huge, they'd get
> > > > >>>>>>> chunked.
> > > This would
> > > > >>>>>>> allow us to reuse all the battle tested couch_key_tree
> > > > >>>>>>> code
> > > mostly as is.
> > > > >>>>>>> We even have property tests for it
> > > > >>>>>>>
> > >
> https://github.com/apache/couchdb/blob/master/src/couch/test/couch_key_tree_prop_tests.erl
> > > > >>>>>>>
> > > > >>>>>>> * It removes the need to explain the max exploded path
> > > > >>>>>>> length
> > > limitation to
> > > > >>>>>>> customers.
> > > > >>>>>>>
> > > > >>>>>>> Cheers,
> > > > >>>>>>> -Nick
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Feb 19, 2019 at 11:18 AM Robert Newson <
> > > rnewson@apache.org> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi,
> > > > >>>>>>>>
> > > > >>>>>>>> An alternative storage model that we should seriously
> > > > >>>>>>>> consider
> > > is to
> > > > >>>>>>>> follow our current approach in couch_file et al.
> > > > >>>>>>>> Specifically,
> > > that the
> > > > >>>>>>>> document _body_ is stored as an uninterpreted binary
> > > > >>>>>>>> value.
> > > This would be
> > > > >>>>>>>> much like the obvious plan for attachment storage; a key
> > > > >>>>>>>> prefix
> > > that
> > > > >>>>>>>> identifies the database and document, with the final
> > > > >>>>>>>> item of
> > > that key tuple
> > > > >>>>>>>> is an incrementing integer. Each of those keys has a
> > > > >>>>>>>> binary
> > > value of up to
> > > > >>>>>>>> 100k. Fetching all values with that key prefix, in fdb's
> > > natural ordering,
> > > > >>>>>>>> will yield the full document body, which can be JSON
> > > > >>>>>>>> decoded
> > > for further
> > > > >>>>>>>> processing.
> > > > >>>>>>>>
> > > > >>>>>>>> I like this idea, and I like Adam's original proposal to
> > > explode documents
> > > > >>>>>>>> into property paths. I have a slight preference for the
> > > simplicity of the
> > > > >>>>>>>> idea in the previous paragraph, not least because it's
> > > > >>>>>>>> close to
> > > what we do
> > > > >>>>>>>> today. I also think it will be possible to migrate to
> > > alternative storage
> > > > >>>>>>>> models in future, and foundationdb's transaction
> > > > >>>>>>>> supports means
> > > we can do
> > > > >>>>>>>> this migration seamlessly should we come to it.
> > > > >>>>>>>>
> > > > >>>>>>>> I'm very interested in knowing if anyone else is
> > > > >>>>>>>> interested in
> > > going this
> > > > >>>>>>>> simple, or considers it a wasted opportunity relative to
> > > > >>>>>>>> the
> > > 'exploded'
> > > > >>>>>>>> path.
> > > > >>>>>>>>
> > > > >>>>>>>> B.
> > > > >>>>>>>>
> > > > >>>>>>>> --
> > > > >>>>>>>> Robert Samuel Newson
> > > > >>>>>>>> rnewson@apache.org
> > > > >>>>>>>>
> > > > >>>>>>>> On Mon, 4 Feb 2019, at 19:59, Robert Newson wrote:
> > > > >>>>>>>>> I've been remiss here in not posting the data model
> > > > >>>>>>>>> ideas that
> > > IBM
> > > > >>>>>>>>> worked up while we were thinking about using
> > > > >>>>>>>>> FoundationDB so
> > > I'm posting
> > > > >>>>>>>>> it now. This is Adam' Kocoloski's original work, I am
> > > > >>>>>>>>> just
> > > transcribing
> > > > >>>>>>>>> it, and this is the context that the folks from the IBM
> > > > >>>>>>>>> side
> > > came in
> > > > >>>>>>>>> with, for full disclosure.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Basics
> > > > >>>>>>>>>
> > > > >>>>>>>>> 1. All CouchDB databases are inside a Directory
> > > > >>>>>>>>> 2. Each CouchDB database is a Directory within that
> > > > >>>>>>>>> Directory
> > > > >>>>>>>>> 3. It's possible to list all subdirectories of a
> > > > >>>>>>>>> Directory, so
> > > > >>>>>>>>> `_all_dbs` is the list of directories from 1.
> > > > >>>>>>>>> 4. Each Directory representing a CouchdB database has
> > > > >>>>>>>>> several
> > > Subspaces;
> > > > >>>>>>>>> 4a. by_id/ doc subspace: actual document contents
> > > > >>>>>>>>> 4b. by_seq/versionstamp subspace: for the _changes feed
> > > > >>>>>>>>> 4c. index_definitions, indexes, ...
> > > > >>>>>>>>>
> > > > >>>>>>>>> JSON Mapping
> > > > >>>>>>>>>
> > > > >>>>>>>>> A hierarchical JSON object naturally maps to multiple
> > > > >>>>>>>>> KV pairs
> > > in FDB:
> > > > >>>>>>>>>
> > > > >>>>>>>>> {
> > > > >>>>>>>>> “_id”: “foo”,
> > > > >>>>>>>>> “owner”: “bob”,
> > > > >>>>>>>>> “mylist”: [1,3,5],
> > > > >>>>>>>>> “mymap”: {
> > > > >>>>>>>>>     “blue”: “#0000FF”,
> > > > >>>>>>>>>     “red”: “#FF0000”
> > > > >>>>>>>>> }
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> maps to
> > > > >>>>>>>>>
> > > > >>>>>>>>> (“foo”, “owner”) = “bob”
> > > > >>>>>>>>> (“foo”, “mylist”, 0) = 1
> > > > >>>>>>>>> (“foo”, “mylist”, 1) = 3
> > > > >>>>>>>>> (“foo”, “mylist”, 2) = 5
> > > > >>>>>>>>> (“foo”, “mymap”, “blue”) = “#0000FF”
> > > > >>>>>>>>> (“foo”, “mymap”, “red”) = “#FF0000”
> > > > >>>>>>>>>
> > > > >>>>>>>>> NB: this means that the 100KB limit applies to
> > > > >>>>>>>>> individual
> > > leafs in the
> > > > >>>>>>>>> JSON object, not the entire doc
> > > > >>>>>>>>>
> > > > >>>>>>>>> Edit Conflicts
> > > > >>>>>>>>>
> > > > >>>>>>>>> We need to account for the presence of conflicts in
> > > > >>>>>>>>> various
> > > levels of
> > > > >>>>>>>>> the doc due to replication.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Proposal is to create a special value indicating that
> > > > >>>>>>>>> the
> > > subtree below
> > > > >>>>>>>>> our current cursor position is in an unresolvable
> > > > >>>>>>>>> conflict.
> > > Then add
> > > > >>>>>>>>> additional KV pairs below to describe the conflicting
> > > > >>>>>>>>> entries.
> > > > >>>>>>>>>
> > > > >>>>>>>>> KV data model allows us to store these efficiently and
> > > > >>>>>>>>> minimize
> > > > >>>>>>>>> duplication of data:
> > > > >>>>>>>>>
> > > > >>>>>>>>> A document with these two conflicts:
> > > > >>>>>>>>>
> > > > >>>>>>>>> {
> > > > >>>>>>>>> “_id”: “foo”,
> > > > >>>>>>>>> “_rev”: “1-abc”,
> > > > >>>>>>>>> “owner”: “alice”,
> > > > >>>>>>>>> “active”: true
> > > > >>>>>>>>> }
> > > > >>>>>>>>> {
> > > > >>>>>>>>> “_id”: “foo”,
> > > > >>>>>>>>> “_rev”: “1-def”,
> > > > >>>>>>>>> “owner”: “bob”,
> > > > >>>>>>>>> “active”: true
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> could be stored thus:
> > > > >>>>>>>>>
> > > > >>>>>>>>> (“foo”, “active”) = true
> > > > >>>>>>>>> (“foo”, “owner”) = kCONFLICT
> > > > >>>>>>>>> (“foo”, “owner”, “1-abc”) = “alice”
> > > > >>>>>>>>> (“foo”, “owner”, “1-def”) = “bob”
> > > > >>>>>>>>>
> > > > >>>>>>>>> So long as `kCONFLICT` is set at the top of the
> > > > >>>>>>>>> conflicting
> > > subtree this
> > > > >>>>>>>>> representation can handle conflicts of different data
> > > > >>>>>>>>> types as
> > > well.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Missing fields need to be handled explicitly:
> > > > >>>>>>>>>
> > > > >>>>>>>>> {
> > > > >>>>>>>>> “_id”: “foo”,
> > > > >>>>>>>>> “_rev”: “1-abc”,
> > > > >>>>>>>>> “owner”: “alice”,
> > > > >>>>>>>>> “active”: true
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> {
> > > > >>>>>>>>> “_id”: “foo”,
> > > > >>>>>>>>> “_rev”: “1-def”,
> > > > >>>>>>>>> “owner”: {
> > > > >>>>>>>>> “name”: “bob”,
> > > > >>>>>>>>> “email”: “
> > > > >>>>>>>>> bob@example.com
> > > > >>>>>>>>> "
> > > > >>>>>>>>> }
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> could be stored thus:
> > > > >>>>>>>>>
> > > > >>>>>>>>> (“foo”, “active”) = kCONFLICT
> > > > >>>>>>>>> (“foo”, “active”, “1-abc”) = true
> > > > >>>>>>>>> (“foo”, “active”, “1-def”) = kMISSING
> > > > >>>>>>>>> (“foo”, “owner”) = kCONFLICT
> > > > >>>>>>>>> (“foo”, “owner”, “1-abc”) = “alice”
> > > > >>>>>>>>> (“foo”, “owner”, “1-def”, “name”) = “bob”
> > > > >>>>>>>>> (“foo”, “owner”, “1-def”, “email”) = ...
> > > > >>>>>>>>>
> > > > >>>>>>>>> Revision Metadata
> > > > >>>>>>>>>
> > > > >>>>>>>>> * CouchDB uses a hash history for revisions
> > > > >>>>>>>>> ** Each edit is identified by the hash of the content
> > > > >>>>>>>>> of the
> > > edit
> > > > >>>>>>>>> including the base revision against which it was
> > > > >>>>>>>>> applied
> > > > >>>>>>>>> ** Individual edit branches are bounded in length but
> > > > >>>>>>>>> the
> > > number of
> > > > >>>>>>>>> branches is potentially unbounded
> > > > >>>>>>>>>
> > > > >>>>>>>>> * Size limits preclude us from storing the entire key
> > > > >>>>>>>>> tree as
> > > a single
> > > > >>>>>>>>> value; in pathological situations
> > > > >>>>>>>>> the tree could exceed 100KB (each entry is > 16 bytes)
> > > > >>>>>>>>>
> > > > >>>>>>>>> * Store each edit branch as a separate KV including
> > > > >>>>>>>>> deleted
> > > status in a
> > > > >>>>>>>>> special subspace
> > > > >>>>>>>>>
> > > > >>>>>>>>> * Structure key representation so that “winning”
> > > > >>>>>>>>> revision can
> > > be
> > > > >>>>>>>>> automatically retrieved in a limit=1
> > > > >>>>>>>>> key range operation
> > > > >>>>>>>>>
> > > > >>>>>>>>> (“foo”, “_meta”, “deleted=false”, 1, “def”) = []
> > > > >>>>>>>>> (“foo”, “_meta”, “deleted=false”, 4, “bif”) =
> > > [“3-baz”,”2-bar”,”1-foo”]
> > > > >>>>>>>>> <-- winner
> > > > >>>>>>>>> (“foo”, “_meta”, “deleted=true”, 3, “abc”) = [“2-bar”,
> > > > >>>>>>>>> “1-foo”]
> > > > >>>>>>>>>
> > > > >>>>>>>>> Changes Feed
> > > > >>>>>>>>>
> > > > >>>>>>>>> * FDB supports a concept called a versionstamp — a 10
> > > > >>>>>>>>> byte,
> > > unique,
> > > > >>>>>>>>> monotonically (but not sequentially) increasing value
> > > > >>>>>>>>> for each
> > > committed
> > > > >>>>>>>>> transaction. The first 8 bytes are the committed
> > > > >>>>>>>>> version of the
> > > > >>>>>>>>> database. The last 2 bytes are monotonic in the
> > > > >>>>>>>>> serialization
> > > order for
> > > > >>>>>>>>> transactions.
> > > > >>>>>>>>>
> > > > >>>>>>>>> * A transaction can specify a particular index into a
> > > > >>>>>>>>> key
> > > where the
> > > > >>>>>>>>> following 10 bytes will be overwritten by the
> > > > >>>>>>>>> versionstamp at
> > > commit
> > > > >>>>>>>>> time
> > > > >>>>>>>>>
> > > > >>>>>>>>> * A subspace keyed on versionstamp naturally yields a
> > > > >>>>>>>>> _changes
> > > feed
> > > > >>>>>>>>>
> > > > >>>>>>>>> by_seq subspace
> > > > >>>>>>>>> (“versionstamp1”) = (“foo”, “1-abc”)
> > > > >>>>>>>>> (“versionstamp4”) = (“bar”, “4-def”)
> > > > >>>>>>>>>
> > > > >>>>>>>>> by_id subspace
> > > > >>>>>>>>> (“bar”, “_vsn”) = “versionstamp4”
> > > > >>>>>>>>> ...
> > > > >>>>>>>>> (“foo”, “_vsn”) = “versionstamp1”
> > > > >>>>>>>>>
> > > > >>>>>>>>> JSON Indexes
> > > > >>>>>>>>>
> > > > >>>>>>>>> * “Mango” JSON indexes are defined by
> > > > >>>>>>>>> ** a list of field names, each of which may be nested,
> > > > >>>>>>>>> ** an optional partial_filter_selector which constrains
> > > > >>>>>>>>> the
> > > set of docs
> > > > >>>>>>>>> that contribute
> > > > >>>>>>>>> ** an optional name defined by the ddoc field (the name
> > > > >>>>>>>>> is
> > > auto-
> > > > >>>>>>>>> generated if not supplied)
> > > > >>>>>>>>>
> > > > >>>>>>>>> * Store index definitions in a single subspace to aid
> > > > >>>>>>>>> query
> > > planning
> > > > >>>>>>>>> ** ((person,name), title, email) = (“name-title-email”,
> > > “{“student”:
> > > > >>>>>>>>> true}”)
> > > > >>>>>>>>> ** Store the values for each index in a dedicated
> > > > >>>>>>>>> subspace,
> > > adding the
> > > > >>>>>>>>> document ID as the last element in the tuple
> > > > >>>>>>>>> *** (“rosie revere”, “engineer”, “rosie@example.com",
> > > > >>>>>>>>> “foo”)
> > > = null
> > > > >>>>>>>>>
> > > > >>>>>>>>> B.
> > > > >>>>>>>>>
> > > > >>>>>>>>> --
> > > > >>>>>>>>> Robert Samuel Newson
> > > > >>>>>>>>> rnewson@apache.org
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Mon, 4 Feb 2019, at 19:13, Ilya Khlopotov wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I want to fix previous mistakes. I did two mistakes in
> > > previous
> > > > >>>>>>>>>> calculations:
> > > > >>>>>>>>>> - I used 1Kb as base size for calculating expansion
> > > > >>>>>>>>>> factor
> > > (although
> > > > >>>>>>>> we
> > > > >>>>>>>>>> don't know exact size of original document)
> > > > >>>>>>>>>> - The expansion factor calculation included number of
> > > revisions (it
> > > > >>>>>>>>>> shouldn't)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I'll focus on flattened JSON docs model
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> The following formula is used in previous calculation.
> > > > >>>>>>>>>>
> > > storage_size_per_document=mapping_table_size*number_of_revisions +
> > > > >>>>>>>>>> depth*number_of_paths*number_of_revisions +
> > > > >>>>>>>>>> number_of_paths*value_size*number_of_revisions
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> To clarify things a little bit I want to calculate
> > > > >>>>>>>>>> space
> > > requirement
> > > > >>>>>>>> for
> > > > >>>>>>>>>> single revision this time.
> > > > >>>>>>>>>>
> > > mapping_table_size=field_name_size*(field_name_length+4(integer
> > > > >>>>>>>>>> size))=100 * (20 + 4(integer size)) = 2400 bytes
> > > > >>>>>>>>>>
> > > storage_size_per_document_per_revision_per_replica=mapping_table_size
> > > > >>>>>>>> +
> > > > >>>>>>>>>> depth*number_of_paths + value_size*number_of_paths =
> > > > >>>>>>>>>> 2400bytes + 10*1000+1000*100=112400bytes~=110 Kb
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> We definitely can reduce requirement for mapping table
> > > > >>>>>>>>>> by
> > > adopting
> > > > >>>>>>>>>> rnewson's idea of a schema.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On 2019/02/04 11:08:16, Ilya Khlopotov
> > > > >>>>>>>>>> <iilyak@apache.org>
> > > wrote:
> > > > >>>>>>>>>>> Hi Michael,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> For example, hears a crazy thought:
> > > > >>>>>>>>>>>> Map every distinct occurence of a key/value instance
> > > through a
> > > > >>>>>>>> crypto hash
> > > > >>>>>>>>>>>> function to get a set of hashes.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> These can be be precomputed by Couch without any
> > > > >>>>>>>>>>>> lookups in
> > > FDB.
> > > > >>>>>>>> These
> > > > >>>>>>>>>>>> will be spread all over kingdom come in FDB and not
> > > > >>>>>>>>>>>> lend
> > > > >>>>>>>> themselves to
> > > > >>>>>>>>>>>> range search well.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> So what you do is index them for frequency of
> > > > >>>>>>>>>>>> occurring in
> > > the
> > > > >>>>>>>> same set.
> > > > >>>>>>>>>>>> In essence, you 'bucket them' statistically, and
> > > > >>>>>>>>>>>> that
> > > bucket id
> > > > >>>>>>>> becomes a
> > > > >>>>>>>>>>>> key prefix. A crypto hash value can be copied into
> > > > >>>>>>>>>>>> more
> > > than one
> > > > >>>>>>>> bucket.
> > > > >>>>>>>>>>>> The {bucket_id}/{cryptohash} becomes a {val_id}
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> When writing a document, Couch submits the
> > > > >>>>>>>>>>>> list/array of
> > > > >>>>>>>> cryptohash values
> > > > >>>>>>>>>>>> it computed to FDB and gets back the corresponding
> > > {val_id} (the
> > > > >>>>>>>> id with
> > > > >>>>>>>>>>>> the bucket prefixed).  This can get somewhat
> > > > >>>>>>>>>>>> expensive if
> > > there's
> > > > >>>>>>>> always a
> > > > >>>>>>>>>>>> lot of app local cache misses.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> A document's value is then a series of {val_id}
> > > > >>>>>>>>>>>> arrays up
> > > to 100k
> > > > >>>>>>>> per
> > > > >>>>>>>>>>>> segment.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> When retrieving a document, you get the val_ids,
> > > > >>>>>>>>>>>> find the
> > > distinct
> > > > >>>>>>>> buckets
> > > > >>>>>>>>>>>> and min/max entries for this doc, and then parallel
> > > > >>>>>>>>>>>> query
> > > each
> > > > >>>>>>>> bucket while
> > > > >>>>>>>>>>>> reconstructing the document.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Interesting idea. Let's try to think it through to
> > > > >>>>>>>>>>> see if we
> > > can
> > > > >>>>>>>> make it viable.
> > > > >>>>>>>>>>> Let's go through hypothetical example. Input data for
> > > > >>>>>>>>>>> the
> > > example:
> > > > >>>>>>>>>>> - 1M of documents
> > > > >>>>>>>>>>> - each document is around 10Kb
> > > > >>>>>>>>>>> - each document consists of 1K of unique JSON paths
> > > > >>>>>>>>>>> - each document has 100 unique JSON field names
> > > > >>>>>>>>>>> - every scalar value is 100 bytes
> > > > >>>>>>>>>>> - 10% of unique JSON paths for every document already
> > > > >>>>>>>>>>> stored
> > > in
> > > > >>>>>>>> database under different doc or different revision of
> > > > >>>>>>>> the
> > > current one
> > > > >>>>>>>>>>> - we assume 3 independent copies for every key-value
> > > > >>>>>>>>>>> pair in
> > > FDB
> > > > >>>>>>>>>>> - our hash key size is 32 bytes
> > > > >>>>>>>>>>> - let's assume we can determine if key is already on
> > > > >>>>>>>>>>> the
> > > storage
> > > > >>>>>>>> without doing query
> > > > >>>>>>>>>>> - 1% of paths is in cache (unrealistic value, in real
> > > > >>>>>>>>>>> live
> > > the
> > > > >>>>>>>> percentage is lower)
> > > > >>>>>>>>>>> - every JSON field name is 20 bytes
> > > > >>>>>>>>>>> - every JSON path is 10 levels deep
> > > > >>>>>>>>>>> - document key prefix length is 50
> > > > >>>>>>>>>>> - every document has 10 revisions
> > > > >>>>>>>>>>> Let's estimate the storage requirements and size of
> > > > >>>>>>>>>>> data we
> > > need to
> > > > >>>>>>>> transmit. The calculations are not exact.
> > > > >>>>>>>>>>> 1. storage_size_per_document (we cannot estimate
> > > > >>>>>>>>>>> exact
> > > numbers since
> > > > >>>>>>>> we don't know how FDB stores it)
> > > > >>>>>>>>>>> - 10 * ((10Kb - (10Kb * 10%)) + (1K - (1K * 10%)) *
> > > > >>>>>>>>>>> 32
> > > bytes) =
> > > > >>>>>>>> 38Kb * 10 * 3 = 1140 Kb (11x)
> > > > >>>>>>>>>>> 2. number of independent keys to retrieve on document
> > > > >>>>>>>>>>> read
> > > > >>>>>>>> (non-range queries) per document
> > > > >>>>>>>>>>> - 1K - (1K * 1%) = 990
> > > > >>>>>>>>>>> 3. number of range queries: 0
> > > > >>>>>>>>>>> 4. data to transmit on read: (1K - (1K * 1%)) * (100
> > > > >>>>>>>>>>> bytes +
> > > 32
> > > > >>>>>>>> bytes) = 102 Kb (10x)
> > > > >>>>>>>>>>> 5. read latency (we use 2ms per read based on numbers
> > > > >>>>>>>>>>> from
> > > > >>>>>>>> https://apple.github.io/foundationdb/performance.html)
> > > > >>>>>>>>>>> - sequential: 990*2ms = 1980ms
> > > > >>>>>>>>>>> - range: 0
> > > > >>>>>>>>>>> Let's compare these numbers with initial proposal
> > > > >>>>>>>>>>> (flattened
> > > JSON
> > > > >>>>>>>> docs without global schema and without cache)
> > > > >>>>>>>>>>> 1. storage_size_per_document
> > > > >>>>>>>>>>> - mapping table size: 100 * (20 + 4(integer size)) =
> > > > >>>>>>>>>>> 2400
> > > bytes
> > > > >>>>>>>>>>> - key size: (10 * (4 + 1(delimiter))) + 50 = 100
> > > > >>>>>>>>>>> bytes
> > > > >>>>>>>>>>> - storage_size_per_document: 2.4K*10 + 100*1K*10 +
> > > > >>>>>>>>>>> 1K*100*10
> > > =
> > > > >>>>>>>> 2024K = 1976 Kb * 3 = 5930 Kb (59.3x)
> > > > >>>>>>>>>>> 2. number of independent keys to retrieve: 0-2
> > > > >>>>>>>>>>> (depending on
> > > index
> > > > >>>>>>>> structure)
> > > > >>>>>>>>>>> 3. number of range queries: 1 (1001 of keys in
> > > > >>>>>>>>>>> result)
> > > > >>>>>>>>>>> 4. data to transmit on read: 24K + 1000*100 +
> > > > >>>>>>>>>>> 1000*100 =
> > > 23.6 Kb
> > > > >>>>>>>> (2.4x)
> > > > >>>>>>>>>>> 5. read latency (we use 2ms per read based on numbers
> > > > >>>>>>>>>>> from
> > > > >>>>>>>> https://apple.github.io/foundationdb/performance.html
> > > > >>>>>>>> and
> > > estimate range
> > > > >>>>>>>> read performance based on numbers from
> > > > >>>>>>>>
> > >
> https://apple.github.io/foundationdb/benchmarking.html#single-core-read-test
> > > > >>>>>>>> )
> > > > >>>>>>>>>>> - range read performance: Given read performance is
> > > > >>>>>>>>>>> about
> > > 305,000
> > > > >>>>>>>> reads/second and range performance 3,600,000 keys/second
> > > > >>>>>>>> we
> > > estimate range
> > > > >>>>>>>> performance to be 11.8x compared to read performance. If
> > > > >>>>>>>> read
> > > performance
> > > > >>>>>>>> is 2ms than range performance is 0.169ms (which is hard
> > > > >>>>>>>> to
> > > believe).
> > > > >>>>>>>>>>> - sequential: 2 * 2 = 4ms
> > > > >>>>>>>>>>> - range: 0.169
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> It looks like we are dealing with a tradeoff:
> > > > >>>>>>>>>>> - Map every distinct occurrence of a key/value
> > > > >>>>>>>>>>> instance
> > > through a
> > > > >>>>>>>> crypto hash:
> > > > >>>>>>>>>>> - 5.39x more disk space efficient
> > > > >>>>>>>>>>> - 474x slower
> > > > >>>>>>>>>>> - flattened JSON model
> > > > >>>>>>>>>>> - 5.39x less efficient in disk space
> > > > >>>>>>>>>>> - 474x faster
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> In any case this unscientific exercise was very
> > > > >>>>>>>>>>> helpful.
> > > Since it
> > > > >>>>>>>> uncovered the high cost in terms of disk space. 59.3x of
> > > original disk size
> > > > >>>>>>>> is too much IMO.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Are the any ways we can make Michael's model more
> > > > >>>>>>>>>>> performant?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Also I don't quite understand few aspects of the
> > > > >>>>>>>>>>> global hash
> > > table
> > > > >>>>>>>> proposal:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 1. > - Map every distinct occurence of a key/value
> > > > >>>>>>>>>>> instance
> > > through
> > > > >>>>>>>> a crypto hash function to get a set of hashes.
> > > > >>>>>>>>>>> I think we are talking only about scalar values here?
> > > > >>>>>>>>>>> I.e.
> > > > >>>>>>>> `"#/foo.bar.baz": 123`
> > > > >>>>>>>>>>> Since I don't know how we can make it work for all
> > > > >>>>>>>>>>> possible
> > > JSON
> > > > >>>>>>>> paths `{"foo": {"bar": {"size": 12, "baz": 123}}}":
> > > > >>>>>>>>>>> - foo
> > > > >>>>>>>>>>> - foo.bar
> > > > >>>>>>>>>>> - foo.bar.baz
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 2. how to delete documents
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Best regards,
> > > > >>>>>>>>>>> ILYA
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On 2019/01/30 23:33:22, Michael Fair <
> > > michael@daclubhouse.net>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>>>> On Wed, Jan 30, 2019, 12:57 PM Adam Kocoloski <
> > > kocolosk@apache.org
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Hi Michael,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> The trivial fix is to use DOCID/REVISIONID as
> > > > >>>>>>>>>>>>>> DOC_KEY.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Yes that’s definitely one way to address storage of
> > > > >>>>>>>>>>>>> edit
> > > > >>>>>>>> conflicts. I
> > > > >>>>>>>>>>>>> think there are other, more compact representations
> > > > >>>>>>>>>>>>> that
> > > we can
> > > > >>>>>>>> explore if
> > > > >>>>>>>>>>>>> we have this “exploded” data model where each
> > > > >>>>>>>>>>>>> scalar value
> > > maps
> > > > >>>>>>>> to an
> > > > >>>>>>>>>>>>> individual KV pair.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I agree, as I mentioned on the original thread, I
> > > > >>>>>>>>>>>> see a
> > > scheme,
> > > > >>>>>>>> that
> > > > >>>>>>>>>>>> handles both conflicts and revisions, where you only
> > > > >>>>>>>>>>>> have
> > > to store
> > > > >>>>>>>> the most
> > > > >>>>>>>>>>>> recent change to a field.  Like you suggested,
> > > > >>>>>>>>>>>> multiple
> > > revisions
> > > > >>>>>>>> can share
> > > > >>>>>>>>>>>> a key.  Which in my mind's eye further begs the
> > > conflicts/revisions
> > > > >>>>>>>>>>>> discussion along with the working within the limits
> > > discussion
> > > > >>>>>>>> because it
> > > > >>>>>>>>>>>> seems to me they are all intrinsically related as a
> > > "feature".
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Saying 'We'll break documents up into roughly 80k
> > > segments', then
> > > > >>>>>>>> trying to
> > > > >>>>>>>>>>>> overlay some kind of field sharing scheme for
> > > revisions/conflicts
> > > > >>>>>>>> doesn't
> > > > >>>>>>>>>>>> seem like it will work.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I probably should have left out the trivial fix
> > > > >>>>>>>>>>>> proposal as
> > > I
> > > > >>>>>>>> don't think
> > > > >>>>>>>>>>>> it's a feasible solution to actually use.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> The comment is more regarding that I do not see how
> > > > >>>>>>>>>>>> this
> > > thread
> > > > >>>>>>>> can escape
> > > > >>>>>>>>>>>> including how to store/retrieve conflicts/revisions.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> For instance, the 'doc as individual fields'
> > > > >>>>>>>>>>>> proposal lends
> > > itself
> > > > >>>>>>>> to value
> > > > >>>>>>>>>>>> sharing across mutiple documents (and I don't just
> > > > >>>>>>>>>>>> mean
> > > revisions
> > > > >>>>>>>> of the
> > > > >>>>>>>>>>>> same doc, I mean the same key/value instance could
> > > > >>>>>>>>>>>> be
> > > shared for
> > > > >>>>>>>> every
> > > > >>>>>>>>>>>> document).
> > > > >>>>>>>>>>>> However that's not really relevant if we're not
> > > > >>>>>>>>>>>> considering
> > > the
> > > > >>>>>>>> amount of
> > > > >>>>>>>>>>>> shared information across documents in the storage
> > > > >>>>>>>>>>>> scheme.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Simply storing documents in <100k segments (perhaps
> > > > >>>>>>>>>>>> in some
> > > kind of
> > > > >>>>>>>>>>>> compressed binary representation) to deal with that
> > > > >>>>>>>>>>>> FDB
> > > limit
> > > > >>>>>>>> seems fine.
> > > > >>>>>>>>>>>> The only reason to consider doing something else is
> > > > >>>>>>>>>>>> because
> > > of its
> > > > >>>>>>>> impact
> > > > >>>>>>>>>>>> to indexing, searches, reduce functions, revisions,
> > > > >>>>>>>>>>>> on-disk
> > > size
> > > > >>>>>>>> impact,
> > > > >>>>>>>>>>>> etc.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I'm assuming the process will flatten the key
> > > > >>>>>>>>>>>>>> paths of the
> > > > >>>>>>>> document into
> > > > >>>>>>>>>>>>> an array and then request the value of each key as
> > > > >>>>>>>>>>>>> multiple
> > > > >>>>>>>> parallel
> > > > >>>>>>>>>>>>> queries against FDB at once
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Ah, I think this is not one of Ilya’s assumptions.
> > > > >>>>>>>>>>>>> He’s
> > > trying
> > > > >>>>>>>> to design a
> > > > >>>>>>>>>>>>> model which allows the retrieval of a document with
> > > > >>>>>>>>>>>>> a
> > > single
> > > > >>>>>>>> range read,
> > > > >>>>>>>>>>>>> which is a good goal in my opinion.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I am not sure I agree.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Think of bitTorrent, a single range read should pull
> > > > >>>>>>>>>>>> back
> > > the
> > > > >>>>>>>> structure of
> > > > >>>>>>>>>>>> the document (the pieces to fetch), but not
> > > > >>>>>>>>>>>> necessarily the
> > > whole
> > > > >>>>>>>> document.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> What if you already have a bunch of pieces in common
> > > > >>>>>>>>>>>> with
> > > other
> > > > >>>>>>>> documents
> > > > >>>>>>>>>>>> locally (a repeated header/footer/ or type for
> > > > >>>>>>>>>>>> example);
> > > and you
> > > > >>>>>>>> only need
> > > > >>>>>>>>>>>> to get a few pieces of data you don't already have?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> The real goal to Couch I see is to treat your
> > > > >>>>>>>>>>>> document set
> > > like the
> > > > >>>>>>>>>>>> collection of structured information that it is.  In
> > > > >>>>>>>>>>>> some
> > > respects
> > > > >>>>>>>> like an
> > > > >>>>>>>>>>>> extension of your application's heap space for
> > > > >>>>>>>>>>>> structured
> > > objects
> > > > >>>>>>>> and
> > > > >>>>>>>>>>>> efficiently querying that collection to get back
> > > > >>>>>>>>>>>> subsets of
> > > the
> > > > >>>>>>>> data.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Otherwise it seems more like a slightly upgraded
> > > > >>>>>>>>>>>> file
> > > system plus
> > > > >>>>>>>> a fancy
> > > > >>>>>>>>>>>> grep/find like feature...
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> The best way I see to unlock more features/power is
> > > > >>>>>>>>>>>> to a
> > > move
> > > > >>>>>>>> towards a
> > > > >>>>>>>>>>>> more granular and efficient way to store and
> > > > >>>>>>>>>>>> retrieve the
> > > scalar
> > > > >>>>>>>> values...
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> For example, hears a crazy thought:
> > > > >>>>>>>>>>>> Map every distinct occurence of a key/value instance
> > > through a
> > > > >>>>>>>> crypto hash
> > > > >>>>>>>>>>>> function to get a set of hashes.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> These can be be precomputed by Couch without any
> > > > >>>>>>>>>>>> lookups in
> > > FDB.
> > > > >>>>>>>> These
> > > > >>>>>>>>>>>> will be spread all over kingdom come in FDB and not
> > > > >>>>>>>>>>>> lend
> > > > >>>>>>>> themselves to
> > > > >>>>>>>>>>>> range search well.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> So what you do is index them for frequency of
> > > > >>>>>>>>>>>> occurring in
> > > the
> > > > >>>>>>>> same set.
> > > > >>>>>>>>>>>> In essence, you 'bucket them' statistically, and
> > > > >>>>>>>>>>>> that
> > > bucket id
> > > > >>>>>>>> becomes a
> > > > >>>>>>>>>>>> key prefix. A crypto hash value can be copied into
> > > > >>>>>>>>>>>> more
> > > than one
> > > > >>>>>>>> bucket.
> > > > >>>>>>>>>>>> The {bucket_id}/{cryptohash} becomes a {val_id}
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> When writing a document, Couch submits the
> > > > >>>>>>>>>>>> list/array of
> > > > >>>>>>>> cryptohash values
> > > > >>>>>>>>>>>> it computed to FDB and gets back the corresponding
> > > {val_id} (the
> > > > >>>>>>>> id with
> > > > >>>>>>>>>>>> the bucket prefixed).  This can get somewhat
> > > > >>>>>>>>>>>> expensive if
> > > there's
> > > > >>>>>>>> always a
> > > > >>>>>>>>>>>> lot of app local cache misses.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> A document's value is then a series of {val_id}
> > > > >>>>>>>>>>>> arrays up
> > > to 100k
> > > > >>>>>>>> per
> > > > >>>>>>>>>>>> segment.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> When retrieving a document, you get the val_ids,
> > > > >>>>>>>>>>>> find the
> > > distinct
> > > > >>>>>>>> buckets
> > > > >>>>>>>>>>>> and min/max entries for this doc, and then parallel
> > > > >>>>>>>>>>>> query
> > > each
> > > > >>>>>>>> bucket while
> > > > >>>>>>>>>>>> reconstructing the document.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> The values returned from the buckets query are the
> > > > >>>>>>>>>>>> key/value
> > > > >>>>>>>> strings
> > > > >>>>>>>>>>>> required to reassemble this document.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> ----------
> > > > >>>>>>>>>>>> I put this forward primarily to hilite the idea that
> > > > >>>>>>>>>>>> trying
> > > to
> > > > >>>>>>>> match the
> > > > >>>>>>>>>>>> storage representation of documents in a straight
> > > > >>>>>>>>>>>> forward
> > > way to
> > > > >>>>>>>> FDB keys
> > > > >>>>>>>>>>>> to reduce query count might not be the most
> > > > >>>>>>>>>>>> performance
> > > oriented
> > > > >>>>>>>> approach.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I'd much prefer a storage approach that reduced data
> > > duplication
> > > > >>>>>>>> and
> > > > >>>>>>>>>>>> enabled fast sub-document queries.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> This clearly falls in the realm of what people want
> > > > >>>>>>>>>>>> the
> > > 'use case'
> > > > >>>>>>>> of Couch
> > > > >>>>>>>>>>>> to be/become.  By giving Couch more access to
> > > > >>>>>>>>>>>> sub-document
> > > > >>>>>>>> queries, I could
> > > > >>>>>>>>>>>> eventually see queries as complicated as GraphQL
> > > > >>>>>>>>>>>> submitted
> > > to
> > > > >>>>>>>> Couch and
> > > > >>>>>>>>>>>> pulling back ad-hoc aggregated data across multiple
> > > documents in a
> > > > >>>>>>>> single
> > > > >>>>>>>>>>>> application layer request.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Hehe - one way to look at the database of Couch
> > > > >>>>>>>>>>>> documents
> > > is that
> > > > >>>>>>>> they are
> > > > >>>>>>>>>>>> all conflict revisions of the single root empty
> > > > >>>>>>>>>>>> document.
> > >  What I
> > > > >>>>>>>> mean be
> > > > >>>>>>>>>>>> this is consider thinking of the entire document
> > > > >>>>>>>>>>>> store as
> > > one
> > > > >>>>>>>> giant DAG of
> > > > >>>>>>>>>>>> key/value pairs. How even separate documents are
> > > > >>>>>>>>>>>> still
> > > typically
> > > > >>>>>>>> related to
> > > > >>>>>>>>>>>> each other.  For most applications there is a
> > > > >>>>>>>>>>>> tremendous
> > > amount of
> > > > >>>>>>>> data
> > > > >>>>>>>>>>>> redundancy between docs and especially between
> > > > >>>>>>>>>>>> revisions of
> > > those
> > > > >>>>>>>> docs...
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> And all this is a long way of saying "I think there
> > > > >>>>>>>>>>>> could
> > > be a lot
> > > > >>>>>>>> of value
> > > > >>>>>>>>>>>> in assuming documents are 'assembled' from multiple
> > > > >>>>>>>>>>>> queries
> > > to
> > > > >>>>>>>> FDB, with
> > > > >>>>>>>>>>>> local caching, instead of simply retrieved"
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Thanks, I hope I'm not the only outlier here
> > > > >>>>>>>>>>>> thinking this
> > > way!?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Mike :-)
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >
> > > >
> > >
> >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message