couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Keizer <rob...@keizer.ca>
Subject Re: Erlang vs JavaScript
Date Sun, 18 Aug 2013 17:49:13 GMT
On 13-08-18 09:33 AM, Alexander Shorin wrote:
> On Sun, Aug 18, 2013 at 3:54 PM, Volker Mische <volker.mische@gmail.com> wrote:
>> On 08/18/2013 08:42 AM, Alexander Shorin wrote:
>>> On Sun, Aug 18, 2013 at 10:22 AM, Benoit Chesneau <bchesneau@gmail.com>
wrote:
>>>> On Fri, Aug 16, 2013 at 9:58 PM, Alexander Shorin <kxepal@gmail.com>
wrote:
>>>>
>>>>> On Fri, Aug 16, 2013 at 11:23 PM, Jason Smith <jhs@apache.org>
wrote:
>>>>>> On Fri, Aug 16, 2013 at 4:49 PM, Volker Mische <volker.mische@gmail.com>
>>>>>> wrote:
>>>>>>> On 08/16/2013 11:32 AM, Alexander Shorin wrote:
>>>>>>>> On Fri, Aug 16, 2013 at 1:12 PM, Benoit Chesneau <bchesneau@gmail.com
>>>>>>>> wrote:
>>>>>>>>> I agree, (modulo the fact that I would replace a string
by a binary
>>>>> ;)
>>>>>>>>> but
>>>>>>>>> that would be only possible if we extract the metadata
(_id, _rev)
>>>>> from
>>>>>>>>> the
>>>>>>>>> JSON so couchdb wouldn't have to decode the JSON to get
them.
>>>>> Streaming
>>>>>>>>> json would also allows that but since there is no guaranty
in the
>>>>>>>>> properties order of a JSON it would be less efficient.
>>>>>>>> What if we split document metadata from document itself?
>>>>>>
>>>>>> I would like to hear a goal for this effort? What is the definition
of
>>>>>> success and failure?
>>>>> Idea: move document metadata into separate object.
>>>>>
>>>> How do you link the metadata to the separate object there? Do you let the
>>>> application set the internal links?
>>>>
>>>> I'm +1 with such idea anyway.
>>> Mmm...how I imagine it (Disclaimer: I'm sure I'm wrong in details there!):
>>>
>>> Btree:
>>>
>>>      ----+----
>>>     |        |
>>>   --+--    --+--
>>> |    |  |    |
>>> *    *  *    *
>>>
>>> At the node we have doc object {...} for specific revision. Instead of
>>> this, we'll have a tuple ({...}, {...}) - first is a meta, second is a
>>> data.
>>> So I think there wouldn't be needed internal links since meta and data
>>> would live within same Btree node.
>>> For regular doc requesting, they will be merged (still need for `_`
>>> prefix to avoid collisions?) and returned as single {...} as always.
>> We could also return them as separate objects, so the view function
>> becomes: function(doc, meta) {}.
>>
>> Couchbase does that and from my experience it works well and feel right.
> Oh, so this idea even works (:
>
> However, the trick was about to not pass doc part (in case if it big
> enough) to the view server until view server wouldn't process his
> metadata. Otherwise this is good feature, but it wouldn't help with
> indexing speed up. I remind the trick: first process meta part and if
> it passed - load the doc. Later I'd sent another mail where I'd
> eventually reinvented chained views, because trick with meta does
> exactly the same, chained views are more correct way to go. See quote
> at the end with resume.
>
> Anyway, I feel we need to inherit Couchbase experience with document's
> metadata object (of course if they wouldn't sue us for that ((: )
> since everyone already same some preferred metadata fields (like type)
> or uses special object for that to not pollute main document body.
> I'm prefer special '.meta' object at the document root which holds
> document type info, authorship, timestamps, bindings, etc.
> It's good feature to have no matter does it optimizes indexation
> process or not (:

I would suggest either prefixing with an underscore, or the use of a 
separate object passed to the view server.

If someone ( such as myself ) has many many documents, which happen to 
contain a "meta" attribute, it would be non-trivial to upgrade / 
migrate. A migration script could be written of course, although it 
wouldn't be ideal;

Something to consider, it may be worth while to simply use obj._meta 
instead of .meta.

>
> Below is about chained views:
>
> On Fri, Aug 16, 2013 at 11:58 PM, Alexander Shorin <kxepal@gmail.com> wrote:
>> Resume: probably, I'd just described chained views feature with
>> autoindexing by certain fields (:
>> Removing autoindexing feature and we could make views building process
>> much more faster if we make right views chain which will use set
>> algebra operations to calculate target doc ids to pass to final view:
>> reduce docs before map results:
>>
>> {
>> "views": {
>>      "posts": {"map": "...", "reduce": "..."},
>>      "chain": [
>>       ["by_type", {"key": "post"}],
>>       ["hidden", {"key": false}],
>>       ["by_domain", {"keys": ["public", "wiki"]}]
>>    ]
>>   }
>> }
>>
>> In case of 10000 docs db with 1200 posts where 200 are hidden and 400
>> are private, result view posts have to process only 600 docs instead
>> of 10000 and it's index lookup operation to find out the result docs
>> to pass. Sure, calling such view triggers all views in the chain.
>

Chained views would be awesome! I'm sure I'm not alone in having solved 
this problem by using multiple queries and matching document IDs.

Mime
View raw message