Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: couchdb-user@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com
 designates 74.125.92.149 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:in-reply-to:mime-version
         :content-type:content-transfer-encoding:content-disposition
         :references;
        b=iJ4WqKJAbE3y8/A8GnGXaWm5BrYVDSfTo+YqQ8TLV5Kqqv3PL97kDA4+H6UvsQKFyE
         ZkPvqLCREelWb2muhy8m+SFVQyC5Dq+tLlHkhymGBx2A2QGXPBqSDeac5XcU61hPN53P
         SnRnL1OpJxuxUNdADlfCPgo2TzHRVFdNsp4dE=
Message-ID: <e2111bbb0810270628t15010f33y9f3c3f631121b811@mail.gmail.com>
Date: Mon, 27 Oct 2008 09:28:26 -0400
From: "Paul Davis" <paul.joseph.davis@gmail.com>
To: couchdb-user@incubator.apache.org
Subject: Re: Efficient view design question
In-Reply-To: <4905BD43.5020509@tangentlabs.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <1ac9e0120810261220h1528d201ta638c181b654829@mail.gmail.com>
	 <e282921e0810262232k3c2129a2q95c129cabe5ec66b@mail.gmail.com>
	 <0CFA983F-A3EA-43F5-9B80-76F818F546A4@apache.org>
	 <4905A415.2090208@tangentlabs.co.uk>
	 <e2111bbb0810270559h7151975uc090fcfffb6a5bfd@mail.gmail.com>
	 <4905BD43.5020509@tangentlabs.co.uk>

Jonathan,

That's there too. Same patch even. You can post an array of keys to
any defined or temporary view as well as _all_docs. Not sure if its in
the wiki yet or not.

Note: The post body should include something like

{"keys": ["key1", "key2"]}

And if you're hitting _all_docs, key1... would be document ids.

Paul

On Mon, Oct 27, 2008 at 9:08 AM, Jonathan Moss
<jonathan.moss@tangentlabs.co.uk> wrote:
> Paul,
>
> That makes sense :)
>
> As for using the include_docs parameter that is certainly one option. I also
> believe I saw something mentioned a while ago about being able to retrieve
> multiple docs from a single get request by providing a series of Ids. Was
> this just in discussion or does it already exist since I figure if I already
> have the Ids then I do not need to use a view for this?
>
> Thanks,
>
> Jon
>>
>> Jonathan,
>>
>> First off to alay your main concern, view indexes are not completely
>> regenerated on each update. Its only a diff.
>>
>> So, given we have a database with some built view. If a document X
>> changes in the db, the view serer deletes any rows in the view that
>> came from doc X, then runs the map view with the new version of the
>> doc adding back any of the rows.
>>
>> In this method, each time you request a view, its only updating the
>> data that's changed since the last view request.
>>
>> Other than that, as you point out, emitting the entire doc isn't
>> overly efficient. Things to consider are the relative recent addition
>> of the include_docs parameter. Also, there's a wiki page on working
>> with hierarchal data that's got some good ideas.
>>
>> HTH,
>> Paul Davis
>>
>> On Mon, Oct 27, 2008 at 7:20 AM, Jonathan Moss
>> <jonathan.moss@tangentlabs.co.uk> wrote:
>>
>>>
>>> Greetings all,
>>>
>>> I am currently writing a set of classes to handle php object model <->
>>> CouchDB. The PHP objects are hierarchical and I have modelled this as
>>> essentially a doubly linked list. So that every document within DouchDB
>>> has
>>> a 'Children' array and a 'Parents' array. These arrays contain the Ids or
>>> related objects.
>>>
>>> I already have a couple of map functions to retrieve children and
>>> parents:
>>>
>>> "childrenOf": {
>>>     "map": "function(doc) {for(var idx in doc.Parents)
>>> {emit(doc.Parents[idx], doc);}}"
>>>  },
>>>  "parentsOf": {
>>>     "map": "function(doc) {for(var idx in doc.Children)
>>> {emit(doc.Children[idx], doc);}}"
>>>  }
>>>
>>> These functions return whole documents. My understanding of views is that
>>> these views would have to be re-generated every time a document is added,
>>> removed or updated. If this is the case then when the number of documents
>>> in
>>> the database starts getting larger, the initial response time to retrieve
>>> one of these views would become considerable. In a small, system where
>>> writes are un-common and reads regular. This would not be an issue.
>>> However,
>>> I am struggling to find more than a handful of niche applications were
>>> this
>>> would be true.  In almost all web application I have written, almost
>>> every
>>> request to the website will result in something (even if it is just
>>> tracking
>>> data) being written to the database. On a high volume website this would
>>> result in views having to be re-created almost constantly. Therefore
>>> efficient view design becomes paramount.
>>>
>>> The view functions shown above return the whole doc. Which is know is
>>> in-efficient. In fact since I already have the document I want the
>>> children/parents of, I also already have all the child/parent IDs. Would
>>> it
>>> be much more efficient to simply retrieve the parent/child documents
>>> individually rather than having to re-generate views all the time?
>>>
>>> As a side question - Having to re-generate views constantly in this kind
>>> of
>>> a situation could prove a real issue. I know that CouchDB is still
>>> pre-1.0
>>> release and the developers are necessarily focusing on 'getting is right'
>>> before 'getting it fast' (to coin a phrase :) but will improvements in
>>> speed
>>> already on the roadmap make these worries moot except in very large
>>> databases or is it always going to be an issue and therefore require some
>>> clever application design?
>>> e.g. keeping frequently updated data in a traditional SQL DB and only
>>> keep
>>> rarely updated data in CouchDB, which would be a shame.
>>>
>>> Thanks,
>>> Jon
>>>
>>>
>>
>>
>>
>
>