couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shorin <kxe...@gmail.com>
Subject Re: Erlang vs JavaScript
Date Sun, 18 Aug 2013 14:33:32 GMT
On Sun, Aug 18, 2013 at 3:54 PM, Volker Mische <volker.mische@gmail.com> wrote:
> On 08/18/2013 08:42 AM, Alexander Shorin wrote:
>> On Sun, Aug 18, 2013 at 10:22 AM, Benoit Chesneau <bchesneau@gmail.com> wrote:
>>> On Fri, Aug 16, 2013 at 9:58 PM, Alexander Shorin <kxepal@gmail.com> wrote:
>>>
>>>> On Fri, Aug 16, 2013 at 11:23 PM, Jason Smith <jhs@apache.org> wrote:
>>>>> On Fri, Aug 16, 2013 at 4:49 PM, Volker Mische <volker.mische@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On 08/16/2013 11:32 AM, Alexander Shorin wrote:
>>>>>>> On Fri, Aug 16, 2013 at 1:12 PM, Benoit Chesneau <bchesneau@gmail.com
>>>>>
>>>>>>> wrote:
>>>>>>>> I agree, (modulo the fact that I would replace a string by
a binary
>>>> ;)
>>>>>>>> but
>>>>>>>> that would be only possible if we extract the metadata (_id,
_rev)
>>>> from
>>>>>>>> the
>>>>>>>> JSON so couchdb wouldn't have to decode the JSON to get them.
>>>> Streaming
>>>>>>>> json would also allows that but since there is no guaranty
in the
>>>>>>>> properties order of a JSON it would be less efficient.
>>>>>>>
>>>>>>> What if we split document metadata from document itself?
>>>>>
>>>>>
>>>>> I would like to hear a goal for this effort? What is the definition of
>>>>> success and failure?
>>>>
>>>> Idea: move document metadata into separate object.
>>>>
>>>
>>> How do you link the metadata to the separate object there? Do you let the
>>> application set the internal links?
>>>
>>> I'm +1 with such idea anyway.
>>
>> Mmm...how I imagine it (Disclaimer: I'm sure I'm wrong in details there!):
>>
>> Btree:
>>
>>     ----+----
>>    |        |
>>  --+--    --+--
>> |    |  |    |
>> *    *  *    *
>>
>> At the node we have doc object {...} for specific revision. Instead of
>> this, we'll have a tuple ({...}, {...}) - first is a meta, second is a
>> data.
>> So I think there wouldn't be needed internal links since meta and data
>> would live within same Btree node.
>> For regular doc requesting, they will be merged (still need for `_`
>> prefix to avoid collisions?) and returned as single {...} as always.
>
> We could also return them as separate objects, so the view function
> becomes: function(doc, meta) {}.
>
> Couchbase does that and from my experience it works well and feel right.

Oh, so this idea even works (:

However, the trick was about to not pass doc part (in case if it big
enough) to the view server until view server wouldn't process his
metadata. Otherwise this is good feature, but it wouldn't help with
indexing speed up. I remind the trick: first process meta part and if
it passed - load the doc. Later I'd sent another mail where I'd
eventually reinvented chained views, because trick with meta does
exactly the same, chained views are more correct way to go. See quote
at the end with resume.

Anyway, I feel we need to inherit Couchbase experience with document's
metadata object (of course if they wouldn't sue us for that ((: )
since everyone already same some preferred metadata fields (like type)
or uses special object for that to not pollute main document body.
I'm prefer special '.meta' object at the document root which holds
document type info, authorship, timestamps, bindings, etc.
It's good feature to have no matter does it optimizes indexation
process or not (:

Below is about chained views:

On Fri, Aug 16, 2013 at 11:58 PM, Alexander Shorin <kxepal@gmail.com> wrote:
> Resume: probably, I'd just described chained views feature with
> autoindexing by certain fields (:
> Removing autoindexing feature and we could make views building process
> much more faster if we make right views chain which will use set
> algebra operations to calculate target doc ids to pass to final view:
> reduce docs before map results:
>
> {
> "views": {
>     "posts": {"map": "...", "reduce": "..."},
>     "chain": [
>      ["by_type", {"key": "post"}],
>      ["hidden", {"key": false}],
>      ["by_domain", {"keys": ["public", "wiki"]}]
>   ]
>  }
> }
>
> In case of 10000 docs db with 1200 posts where 200 are hidden and 400
> are private, result view posts have to process only 600 docs instead
> of 10000 and it's index lookup operation to find out the result docs
> to pass. Sure, calling such view triggers all views in the chain.

--
,,,^..^,,,

Mime
View raw message