Jonathan Moss
Efficient view design question
Mon, 27 Oct 2008 11:20:53 GMT
Greetings all,

I am currently writing a set of classes to handle php object model <-> 
CouchDB. The PHP objects are hierarchical and I have modelled this as 
essentially a doubly linked list. So that every document within DouchDB 
has a 'Children' array and a 'Parents' array. These arrays contain the 
Ids or related objects.

I already have a couple of map functions to retrieve children and parents:

"childrenOf": {
       "map": "function(doc) {for(var idx in doc.Parents) 
{emit(doc.Parents[idx], doc);}}"
   "parentsOf": {
       "map": "function(doc) {for(var idx in doc.Children) 
{emit(doc.Children[idx], doc);}}"

These functions return whole documents. My understanding of views is 
that these views would have to be re-generated every time a document is 
added, removed or updated. If this is the case then when the number of 
documents in the database starts getting larger, the initial response 
time to retrieve one of these views would become considerable. In a 
small, system where writes are un-common and reads regular. This would 
not be an issue. However, I am struggling to find more than a handful of 
niche applications were this would be true.  In almost all web 
application I have written, almost every request to the website will 
result in something (even if it is just tracking data) being written to 
the database. On a high volume website this would result in views having 
to be re-created almost constantly. Therefore efficient view design 
becomes paramount.

The view functions shown above return the whole doc. Which is know is 
in-efficient. In fact since I already have the document I want the 
children/parents of, I also already have all the child/parent IDs. Would 
it be much more efficient to simply retrieve the parent/child documents 
individually rather than having to re-generate views all the time?

As a side question - Having to re-generate views constantly in this kind 
of a situation could prove a real issue. I know that CouchDB is still 
pre-1.0 release and the developers are necessarily focusing on 'getting 
is right' before 'getting it fast' (to coin a phrase :) but will 
improvements in speed already on the roadmap make these worries moot 
except in very large databases or is it always going to be an issue and 
therefore require some clever application design?
e.g. keeping frequently updated data in a traditional SQL DB and only 
keep rarely updated data in CouchDB, which would be a shame.


