couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florian Westreicher Bakk.techn." <st...@meredrica.org>
Subject Re: [PROPOSAL] new underscore namespacing
Date Sun, 22 Dec 2013 11:42:44 GMT
I would also not like meta data in the form of headers. AFAIK headers have a finite length
and setting them is not as straight forward as putting some json into the request body. 
Also you would have to carry over all the header information from the get request when you
update a doc or its metadata, which is not easy in some situations.

That said, I like the idea of user writable metadata since I have quite some use cases for
it. Right now I handle the metadata for my application in the document itself which is not
the right place. 

Benoit Chesneau <bchesneau@gmail.com> wrote:
>On Wed, Dec 18, 2013 at 5:39 PM, Volker Mische
><volker.mische@gmail.com>wrote:
>
>> On 12/03/2013 07:12 PM, Benoit Chesneau wrote:
>> > On Tue, Dec 3, 2013 at 3:01 PM, Benjamin Young
><byoung@bigbluehat.com
>> >wrote:
>> >
>> >> Hi all,
>> >>
>> >> Recently the "doc._*" reservation has been causing me trouble when
>> pulling
>> >> in "arbitrary" JSON from various sources that also use the
>underscore
>> >> prefixed names for things (HAL [1], vnd.error [2], other APIs).
>I've
>> also
>> >> hit the wall several times when trying to import filesystem
>contents
>> >> (Sphinx, ghpages, and the like) that use _* prefixing for their
>"special
>> >> folders."
>> >>
>> >> As such, I'd like to propose the following:
>> >> 1. Begin storing new reserved terms in doc._.* (rather than
>doc._*).
>> >>  - this gives developers one object to look into for the meta-data
>> about a
>> >> doc
>> >>  - you can see the scope creep of our current doc._* best in the
>> >> replicator status messages.
>> >>     - doc._ replication_* would become doc._.replication.*
>> >> 2. Move "magic" API endpoints under "/_/" term as well (for the
>sake of
>> >> attachments.
>> >>  - _design/doc would stay the same
>> >>  - but the child endpoints would live under "_design/doc/_/*"
>> >>     - _design/doc/_/view/by_date
>> >>     - _design/doc/_/list/by_date/ul
>> >>     - _design/doc/_/rewrite
>> >>
>> >> I realize these are extreme API shifts, and would need to wait for
>> CouchDB
>> >> 2.0.
>> >>
>> >> The first steps this direction would be to put new reserved word
>keys
>> into
>> >> a "doc._.*" namespace going forward. Closer to the "cut over" for
>2.0
>> >> duplicates of the existing keys (doc._id, doc._rev, especially)
>could
>> also
>> >> live at their new underscore prefixed names (doc._.id, doc._.rev)
>which
>> >> would give devs a chance to migrate code and content.
>> >>
>> >> Doing this would:
>> >> 1. Give us "limitless" space to add content.
>> >> 2. Encourage a namespacing pattern for things like
>doc._.replication.*
>> or
>> >> other logically grouped content.
>> >> 3. Free up CouchDB to accept a far broader range of content and
>remove
>> the
>> >> "hey, you can't put that there! I was here first!" errors. :)
>> >>
>> >> Thanks for considering this,
>> >> Benjamin
>> >>
>> >> [1] http://stateless.co/hal_specification.html
>> >> [2] https://github.com/blongden/vnd.error
>> >>
>> >
>> > I don't see why couchdb should adapt itself to newer things that
>didn't
>> > take care of an older API when doing their stuff but that's
>probably
>> > another concern ;)
>> >
>> > I would find a "/_/" in the URL rather ugly and not needed in that
>case.
>> > Same for having a _ in a doc.  also it doesn't have much sense. Why
>do
>> you
>> > want to change the HTTP api at that level?
>> >
>> > Another way to do it and probably more restish woudl be moving all
>> couchdb
>> > resources in their own namespace. Say `couchdb/` for example. so
>anything
>> > in the resource couchdb will be related to couchdb.
>> >
>> > Next is the the prefix "_" in the doc. It's actually reserved
>because
>> > sometimes, once day we will add other metadata which is fine. But
>raises
>> > the issue you have.
>> >
>> > If I summarise the discussion here amd precedent discussions there
>are
>> > different school there:
>> >
>> > - remove the metadata from the doc and put them in headers or
>aside. I
>> > quite like the first solution, though it may be a problem behind
>some
>> > proxies, or with the header length (especially for json values).
>Also
>> > headers are supposed to be in latin1 in a lot of clients...
>> > - put the metadata in their own namespace which is what you
>propose.
>> >
>> > I dislike the last solution. Mostly because it would force the
>clients to
>> > wait this namespace to read the metadata while parsing the JSON
>(which
>> > could be when streaming it). Instead I would prefer to keep them at
>the
>> > first level and due the reverse: put the data in their own
>namespace, say
>> > `_data`. This allows any clients to ignore this layer if needed
>while
>> > parsing the JSON and get it directly (without parsing  then). The
>> metadata
>> > should be the first citizem imo. Optionally we could add some new
>> > parameters to the doc api allowing someone to only fetch the
>metadata,
>> > etc.. Also couchdb could also parse the coming doc and stop to
>parse the
>> > json when seeing this property and store it directly. It is also
>> following
>> > the logic of attachments somehow. Another things that could be done
>at
>> the
>> > api level is having smth like `/db/docid/_data` which would allows
>you to
>> > only retrieve the data instead of using a show function.
>> >
>> > What do you think?
>> >
>> > - benoit
>>
>> Hi all,
>>
>> I've been talking with Benoit about this at the CouchHack. I think
>his
>> proposal makes a lot of sense. Let's take the separation of meta and
>the
>> document body (as I proposed) together with what Benoit said.
>>
>> When storing the actual data in a top-level property called "_data",
>you
>> could easily extract the meta information, without parsing the body
>at
>> all. You just need to parse all the top level properties (which you
>need
>> to do anyway as JSON doesn't have any distinct sorting).
>>
>> Having this could be a great first step towards making meta and
>document
>> body separation easier to implement.
>>
>> In a next step you could then e.g. provide an API where you just send
>> the document body, with the meta as headers.
>>
>> Cheers,
>>   Volker
>>
>>
>
>I recently started a new project where having  the metadata and the
>content
>separated would make a lot of sense. Here is a quick summary in vrac of
>my
>thinking about it.
>
>- With our current concurrency model, it makes sense to have the
>metadata
>coming with the document. Having them coming in a separate commit/doc
>would
>create a lot of problems in a distributed environment (what happen when
>a
>doc is edited on 2 places and the metadata updated apart). Our revision
>model is here to solve such things.
>
>- It would be interesting to let the user set its own metadata coming
>with
>a document. We could imagine someone adding timestamps, the other
>adding
>authentication infos, .. Some metadata could also be hidden to the user
>that replicate or fetch the doc. Metadata should be really thought a
>description of the doc and the way it will be shared/stored, nothing
>much.
>Ie. mainly used for internal purposes and some could be local to a
>node. It
>also answers to the original problem that raised this thread: we could
>design some entry points in the api that only return the body of the
>doc
>(without its metadata) so the clients would be happy with it. Or such
>thing.
>
>- I like attachments. Transforming couchdb in another object database
>(aka
>blob store) is not really that interesting neither innovative. At the
>end,
>most of the users of  the blob storages are also using a database to
>index
>the objects. Where in an attachment model we are attaching a blob to
>its
>structured description in the doc. Such description can then be indexed
>using the views. I think i the future couchdb should consider
>attachments
>as links attached to a doc. Such link could be internal like it is now
>but
>also external. The remote link would be transparently handled for the
>one
>that replicate, but at the end we could eventually attach a blob from
>an
>external source. We could also link another doc.... (digression
>spotted)
>
>- About metadata sent outside the JSON or as an header, I have a
>preference
>for having them in the JSON sent to couchdb. Mainly because we could
>then
>support any other protocol than HTTP without having to support
>different
>ways to read the metadata coming with the document. Someone that want
>to
>just use TCP to pass the doc could then just handle the transport logic
>and
>give to couchdb the JSON which will be then indexed. Where in other
>cases
>the transport will also need to manage how it get the metadata.
>
>- If we have the metadata in a JSON, like VMX already told,  it's quite
>more efficient to have the metadata at the first level and make the
>content
>available in a `_data` (or `_body`) property. We could then parse the
>JSON
>to fetch all the metadata and omit the `_data` member which will be
>then
>stored on disk. Doing the other way (Having metadata in a `_meta`
>property)
>wouldn't be efficient at all due to the nature of a JSON: there is no
>guaranty about the order of the properties. Also we will have generally
>a
>content bigger than the metadata. (most docs will only have the `_id`
>and
>`_rev`).
>
>After looking at the code , I don't think we need a lot of changes to
>support a system that use a JSON witch separate the metadata from the
>content of the doc. If we are OK with that I could provide quickly a
>patch
>which introduce that change. For the compatibility it could also parse
>the
>full doc when no `_data`  (or `_body`) member is found and make it
>transparent for the end user by reading an api version that could come
>in
>the headers.
>
>Anyway this are just my 2 cents on that topic. I would be more than
>happy
>to discuss this topic further so we could introduce such changes
>rapidly in
>our API (even before any merge possibly).
>
>- benoit

-- 
Sent from Kaiten Mail. Please excuse my brevity.

Mime
View raw message