Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (nike.apache.org: domain of bchesneau@gmail.com designates
 209.85.128.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAHdjipK-4uJ-BSD-bT3eJUcdrwgCcYYcELgaWiOBofMwsVwzxg@mail.gmail.com>
References: 
 <CAPNuaCjQ3wgAd1BMyNUtip0rztK6n3Ex3rnq-hc=NXwL4Wt+pg@mail.gmail.com>
	<CACLy94Uq9eafd_Q-LXV8b3aCipnHgMnaWdn2C1LjnvuMQ-csiA@mail.gmail.com>
	<CAG+HO1zySzJyQMnSj9hfeHVR-XNB2Hpqr97A5ugabvo-wGKdXg@mail.gmail.com>
	<CACLy94XLwHcDyP12igsvCPJ5p46=3=HdoW3xAz_+2RgXH6s=pA@mail.gmail.com>
	<CACLy94Vf6TOz80cM_gWn26nFvMe8A4uQwQNNZQSBRBgYzXB9zg@mail.gmail.com>
	<CAPNuaCiQV2xoDj2O5vg+a-pDzj5ywCwmc+To16ybLeUh4Uc4eA@mail.gmail.com>
	<CACLy94UZtdV+nUUQzoYTiuUP3L=cLy-OjTUC_+6FdBNyAp2fhA@mail.gmail.com>
	<CAPNuaCjavr_udkm_XzO3SMU9gKMfdNnCM1cC71MUQhmtmP5scQ@mail.gmail.com>
	<CACLy94XC=6QFk=FxE+c9Wh41G9C_KosMk8j-tn9-_gia8outRQ@mail.gmail.com>
	<CABvT1DHPsTi=4WwwhTysmZkOda-R6FqsA10NXFyYTUQJR5wGJQ@mail.gmail.com>
	<57E7BFC7-8B8E-4014-8569-B03F99B73E35@apache.org>
	<CAJNb-9okMpQ6Tvxfbia2m_QzT-DVHJ0WfZirPeaHOx6=6TtX3g@mail.gmail.com>
	<520DEB55.8090203@gmail.com>
	<CAJNb-9qqniVvXrmTS2PV5sHvfKBTSHWzb9QHDFxkAgHGZ4+8Jw@mail.gmail.com>
	<CAHdjipJth_AWdfDwqvXrnNsDOhikY0kE-rqWxCmCj__4c5Q6=w@mail.gmail.com>
	<520DF58D.6040308@gmail.com>
	<CACLy94W8+Gt71AUUA1OccCUvYBbpJ2f2stHrEPv1XWdWnP79XQ@mail.gmail.com>
	<CAHdjipK-4uJ-BSD-bT3eJUcdrwgCcYYcELgaWiOBofMwsVwzxg@mail.gmail.com>
Date: Sun, 18 Aug 2013 08:22:53 +0200
Message-ID: 
 <CAJNb-9oUh8g21TPjAqL=KHqtkbxQKkBp5PTcpgan9r4jVKk91Q@mail.gmail.com>
Subject: Re: Erlang vs JavaScript
From: Benoit Chesneau <bchesneau@gmail.com>
To: "dev@couchdb.apache.org" <dev@couchdb.apache.org>
Cc: Jason Smith <jhs@apache.org>
Content-Type: multipart/alternative; boundary=047d7b6da3d2960c8d04e432da72

--047d7b6da3d2960c8d04e432da72
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Aug 16, 2013 at 9:58 PM, Alexander Shorin <kxepal@gmail.com> wrote:

> On Fri, Aug 16, 2013 at 11:23 PM, Jason Smith <jhs@apache.org> wrote:
> > On Fri, Aug 16, 2013 at 4:49 PM, Volker Mische <volker.mische@gmail.com>
> > wrote:
> >>
> >> On 08/16/2013 11:32 AM, Alexander Shorin wrote:
> >> > On Fri, Aug 16, 2013 at 1:12 PM, Benoit Chesneau <bchesneau@gmail.com
> >
> >> > wrote:
> >> >> I agree, (modulo the fact that I would replace a string by a binary
> ;)
> >> >> but
> >> >> that would be only possible if we extract the metadata (_id, _rev)
> from
> >> >> the
> >> >> JSON so couchdb wouldn't have to decode the JSON to get them.
> Streaming
> >> >> json would also allows that but since there is no guaranty in the
> >> >> properties order of a JSON it would be less efficient.
> >> >
> >> > What if we split document metadata from document itself?
> >
> >
> > I would like to hear a goal for this effort? What is the definition of
> > success and failure?
>
> Idea: move document metadata into separate object.
>

How do you link the metadata to the separate object there? Do you let the
application set the internal links?

I'm +1 with such idea anyway.


> Motivation:
>
> Case 1: Small docs. No profit at all. More over, probably it's better
> to not split things there e.g. pass full doc if his size around some
> amount of megabytes.
> Case 2: Large docs. Profit in case when you have set right fields into
> metadata (like doc type, authorship, tags etc.) and filter first by
> this metadata - you have minimal memory footprint, you have less CPU
> load, rule "fast accept - fast reject" works perfectly.
>
> Side effect: it's possible to first filter by metadata and leave only
> required to process document ids. And if we known what and how many to
> process, we may make assumptions about parallel indexation.
>
> Side effect: it's possible to autoindex metadata on fly on document
> update without asking user to write (meta/by_type, meta/by_author,
> meta/by_update_time etc. viiews) . Sure, as much metadata you have as
> large base index will be. In 80% cases it will be no more than 4KB.
>
> Resume: probably, I'd just described chained views feature with
> autoindexing by certain fields (:
> Removing autoindexing feature and we could make views building process
> much more faster if we make right views chain which will use set
> algebra operations to calculate target doc ids to pass to final view:
> reduce docs before map results:
>
> {
> "views": {
>     "posts": {"map": "...", "reduce": "..."},
>     "chain": [
>      ["by_type", {"key": "post"}],
>      ["hidden", {"key": false}],
>      ["by_domain", {"keys": ["public", "wiki"]}]
>   ]
>  }
> }
>
> In case of 10000 docs db with 1200 posts where 200 are hidden and 400
> are private, result view posts have to process only 600 docs instead
> of 10000 and it's index lookup operation to find out the result docs
> to pass. Sure, calling such view triggers all views in the chain. And
> I don't think about cross dependencies and loops for know.
>
> --
> ,,,^..^,,,
>

--047d7b6da3d2960c8d04e432da72--