couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shorin <kxe...@gmail.com>
Subject Re: Erlang vs JavaScript
Date Fri, 16 Aug 2013 19:58:04 GMT
On Fri, Aug 16, 2013 at 11:23 PM, Jason Smith <jhs@apache.org> wrote:
> On Fri, Aug 16, 2013 at 4:49 PM, Volker Mische <volker.mische@gmail.com>
> wrote:
>>
>> On 08/16/2013 11:32 AM, Alexander Shorin wrote:
>> > On Fri, Aug 16, 2013 at 1:12 PM, Benoit Chesneau <bchesneau@gmail.com>
>> > wrote:
>> >> I agree, (modulo the fact that I would replace a string by a binary ;)
>> >> but
>> >> that would be only possible if we extract the metadata (_id, _rev) from
>> >> the
>> >> JSON so couchdb wouldn't have to decode the JSON to get them. Streaming
>> >> json would also allows that but since there is no guaranty in the
>> >> properties order of a JSON it would be less efficient.
>> >
>> > What if we split document metadata from document itself?
>
>
> I would like to hear a goal for this effort? What is the definition of
> success and failure?

Idea: move document metadata into separate object.

Motivation:

Case 1: Small docs. No profit at all. More over, probably it's better
to not split things there e.g. pass full doc if his size around some
amount of megabytes.
Case 2: Large docs. Profit in case when you have set right fields into
metadata (like doc type, authorship, tags etc.) and filter first by
this metadata - you have minimal memory footprint, you have less CPU
load, rule "fast accept - fast reject" works perfectly.

Side effect: it's possible to first filter by metadata and leave only
required to process document ids. And if we known what and how many to
process, we may make assumptions about parallel indexation.

Side effect: it's possible to autoindex metadata on fly on document
update without asking user to write (meta/by_type, meta/by_author,
meta/by_update_time etc. viiews) . Sure, as much metadata you have as
large base index will be. In 80% cases it will be no more than 4KB.

Resume: probably, I'd just described chained views feature with
autoindexing by certain fields (:
Removing autoindexing feature and we could make views building process
much more faster if we make right views chain which will use set
algebra operations to calculate target doc ids to pass to final view:
reduce docs before map results:

{
"views": {
    "posts": {"map": "...", "reduce": "..."},
    "chain": [
     ["by_type", {"key": "post"}],
     ["hidden", {"key": false}],
     ["by_domain", {"keys": ["public", "wiki"]}]
  ]
 }
}

In case of 10000 docs db with 1200 posts where 200 are hidden and 400
are private, result view posts have to process only 600 docs instead
of 10000 and it's index lookup operation to find out the result docs
to pass. Sure, calling such view triggers all views in the chain. And
I don't think about cross dependencies and loops for know.

--
,,,^..^,,,

Mime
View raw message