couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reddy B. <redd...@live.fr>
Subject RE: [DISCUSS] : things we need to solve/decide : storing JSON documents
Date Fri, 01 Feb 2019 09:11:32 GMT
By the way, if the FDB migration was to happen, will CouchDb continue to be a schema-less database
where we can just drop our documents and map/reduce them without further ceremony?

I mean for the long-term, is there a commitment to keeping this feature? This is a big deal,
the basics of CouchDb. I think this is the first assumption you make when you use CouchDb
as of today.

I'm not trying to add toxicity to this very positive, constructive and high quality discussion,
but just some humble feedback. As a user, when I see this being questioned, along with the
other limitations introduced by FDB I am starting to wonder if rebasing is not just a politically
correct way of saying that CouchDb is being retired. For many once core features now become
optional extensions to be implemented.

Which makes me wonder "what's the core" and question the benefit/cost analysis of the switch
in light of the current vision of the project. For it's starting to look like FDB may not
only be used as an implementation convenience but as a new vision for CouchDb (deprecating
the former vision). In light of this the benefit-cost analysis would make sense but such a
change in vision has not been publicly announced.

And this would mean that today's core feature are likely to go the way of Couchapps tomorrow
if the vision has indeed changed. This is a very problematic uncertainty as an end-user thinking
long-term support for new projects. I totally appreciate that this is dev mailing list where
ideas are bounced and technical details worked out, but it's important for us as users to
see commitments on vision, thus my question. I also took advantage of this opportunity to
voice the more general concern aforementioned.

But the specific question is: what's the vision for "schema-less" usage of CouchDb.

Thanks



________________________________
De : Ilya Khlopotov <iilyak@apache.org>
Envoyé : mercredi 30 janvier 2019 22:08
À : dev@couchdb.apache.org
Objet : Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.

This approach would require an invention of schema evolution features similar to recently
open sourced Record Layer https://www.foundationdb.org/files/record-layer-paper.pdf
I am sure some CouchDB users do (because CouchDB is NoSQL i.e. schema-less database):
- rename fields
- reuse field names for something else when they update application
- remove fields
- have documents of different structure in one database

> I think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
in case of global mapping we would do
- get_schema from different subspace (i.e. contact different nodes)
- extract all scalar values by issuing FDB's range query (most likely all values are co-located)
- stitch document together and return it to user

in case of local mapping we don't need to call get_schema. The schema would be returned by
range query.

We would have to stitch document in either case.

Can you elaborate if my understanding is not correct (I didn't quite understand the "Couch
Range fetch" part of your question)?

best regards,
iilyak

On 2019/01/30 20:11:18, Michael Fair <michael@daclubhouse.net> wrote:
> On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov <iilyak@apache.org> wrote:
>
> > FoundationDB Records layer uses global schema for JSON documents. They
> > also have a nice way of creating indexes and schema evolution support.
> > However this support comes at a cost of extra lookups in different
> > subspace. With local mapping table we almost (except a corner case) certain
> > that the schema and JSON fields would be collocated on a single node. Due
> > to common prefix.
> >
>
> In general I think I prefer the global, but separate, key mapping idea and
> use FDB's "cache the important, frequently accessed data, across
> distributed memory" features.
>
> I think I prefer the idea of indexing all document's keys using the same
> identifier set.  In general I think applications have the behavior that
> some keys are referenced far more than other keys and giving those keys in
> each document the same value I think could eventually prove useful for
> making many features faster and easier than expected.
>
> While I really like the independence and locality of a document local
> mapping, when I think about the process of transforming a document's keys
> into that mapping's values, I don't see a particular advantage regarding
> where in the DB that key mapping came from.  I'm assuming the process will
> flatten the key paths of the document into an array and then request the
> value of each key as multiple parallel queries against FDB at once.  I
> think regardless of whether the mapping is document local or global, having
> FDB return those individual values is faster/easier than having Couch Range
> fetch the mapping and do the translation work itself.
>
> I could even see some periodic "reorganizing" engine that could renumber
> frequently used keys to make the reverse transformation back into a value
> that much faster.
>
>
> > > Personally I wonder if the 10KB limit on field paths is anything more
> > than a theoretical concern. It’s hard for me to imagine a useful schema
> > that would get anywhere near that deep, but maybe I’m insufficiently
> > creative :)
>
>
> +1
>
>
> There’s certainly a storage overhead from repeating the upper portion of a
> > path over and over again, but that’s also something the storage engine can
> > optimize away through prefix elision. The current production storage engine
> > in FoundationDB does not do this elision, but the new one in development
> > does.
> >
>
> Assuming it only does "prefix" and not "segment", then I don't think this
> will help because the DOCID for each key in JSON_PATH will be different,
> making the "prefix" to each path across different documents distinct.  The
> prefix matching engine will only be able to match up to the key element
> before the DOCID.
>
> Does/Could/Would the engine allow an app to use FDB itself to create a
> mapping identifier for key "segments" or some other method to "skip past"
> the distinct parts of keys to in a sense "reroot" the search?
>
> If FDB was to "bake in" this "key segment mapping" idea as something it
> exposed to the application layer; that'd be awesome!  Lots of applications
> could probably make use of that.
>
> Mike
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message