incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@grabb.it>
Subject Re: when to use another document and when not to?
Date Tue, 05 Aug 2008 05:49:50 GMT
On Mon, Aug 4, 2008 at 10:15 PM, Sho Fukamachi <sho.fukamachi@gmail.com> wrote:
> So for these reasons I think that just storing the array on both sides is a
> bad idea. From thinking about this I keep coming back to the "membership"
> doc as being a necessity. With a few improvements on the previous
> implementation.

I think the missing link here is the ability to "remap" map and
map/reduce results. In Hadoop-style map/reduce, the output of a single
map will often be remapped in different ways for different purposes.
Being able to share the intermediate results among further
reprocessing is helpful, and often people will chain long stretches of
map reduce processing.

The challenge for the CouchDB programming model for supporting chained
map/reduces is the cache-expiry issue. How can we tell which index
entries to sweep when a document is changed or deleted, when that
index is itself generated by running map/reduce over another index? I
tell myself that the bookkeeping is possible, but it sure sounds like
a big job.

> to me the membership (tag relationship, follower relationship, whatever)
> is a discrete piece of data and should have its own document.

Using remapping, you could have the membership document
({user:user_id, tag:tag, photo:photo_id}), and still get to the goal,
which is a view that has photos sorted by tag, so that with ?key="tag"
you could load all the photos with a given tag. (A user or photo's
tagcloud can come from a view directly on the tagging document.)

I have a prototype of remapping (with no cache-awareness) in
CouchRest's git repo
http://github.com/jchris/couchrest/tree/master/utils/remap.rb

We use it at Grabb.it to build join indexes for doing quick lookups.
The downside is that the index (stored in a separate logical database)
has to be regenerated on the addition of new records, because it
doesn't track which documents contributed to a given key.

You're making sense, but I also wouldn't mind code examples :)

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message