Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 10750 invoked from network); 5 Aug 2008 05:50:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Aug 2008 05:50:33 -0000 Received: (qmail 22592 invoked by uid 500); 5 Aug 2008 05:50:32 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 22558 invoked by uid 500); 5 Aug 2008 05:50:32 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 22547 invoked by uid 99); 5 Aug 2008 05:50:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Aug 2008 22:50:32 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates 74.125.46.158 as permitted sender) Received: from [74.125.46.158] (HELO yw-out-1718.google.com) (74.125.46.158) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Aug 2008 05:49:34 +0000 Received: by yw-out-1718.google.com with SMTP id 5so1308305ywr.0 for ; Mon, 04 Aug 2008 22:49:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=TAMy+gpuhQ+Dnu7rhTg/x9HkYKFphy13K1+A7Ab7xVE=; b=Usgwxjjq1BBzyADgE7zBNbeTP0l5fXw/8GVyR9m09RyL6Z5VLcL9JmLI7dLnN2xcnh XZKEQLrcg4HjZPMFNcaDWSOD2v7eSTn0POVhQ+Jh1ZQh5mPZvcrEFQCLIL79W4dRmTVb XRW+NYSVLvltQ8H6SHh1oUF2yqKIe0RduOsMQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=Zadx7SGTXwWzfcTZxZyUVDQwl3jyZOjcz8mEL/FZyTGtIL2PbD6GMXYc+TNOS5v1GD lodEQz0f0uc2pSpCiABe5ViRvqRKL/puzQK6P08QcfzcGGDDl7XBSLlA7IL5lZ+vS13v uSkOafcFvrDm6YCF8jqvcZZ9Wiidu88RXWhfU= Received: by 10.151.156.7 with SMTP id i7mr746048ybo.115.1217915390861; Mon, 04 Aug 2008 22:49:50 -0700 (PDT) Received: by 10.151.44.19 with HTTP; Mon, 4 Aug 2008 22:49:50 -0700 (PDT) Message-ID: Date: Mon, 4 Aug 2008 22:49:50 -0700 From: "Chris Anderson" Sender: jchris@gmail.com To: couchdb-user@incubator.apache.org Subject: Re: when to use another document and when not to? In-Reply-To: <1C0517A9-69C9-441A-A2FB-37062A76FE0A@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <6b6419750807141546q63431cfek69184295a1cb25b0@mail.gmail.com> <4E5620A8-7ACD-4ADB-8FFD-424EEA289E21@apache.org> <6b6419750807141654x5b548706i1290367ca84acd27@mail.gmail.com> <66471514-BF70-4EFC-A258-CEC5C4799924@gmail.com> <0EEE59DA-C752-43E0-A605-31865F6E020F@apache.org> <3F8F49C2-EC5C-4E6D-AFC9-854C42F7FC6D@gmail.com> <13ae4dd50807271018t66ec2e78r6285c575ae16eddb@mail.gmail.com> <1C0517A9-69C9-441A-A2FB-37062A76FE0A@gmail.com> X-Google-Sender-Auth: 018b5d00dc6eb3a5 X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Aug 4, 2008 at 10:15 PM, Sho Fukamachi wrote: > So for these reasons I think that just storing the array on both sides is a > bad idea. From thinking about this I keep coming back to the "membership" > doc as being a necessity. With a few improvements on the previous > implementation. I think the missing link here is the ability to "remap" map and map/reduce results. In Hadoop-style map/reduce, the output of a single map will often be remapped in different ways for different purposes. Being able to share the intermediate results among further reprocessing is helpful, and often people will chain long stretches of map reduce processing. The challenge for the CouchDB programming model for supporting chained map/reduces is the cache-expiry issue. How can we tell which index entries to sweep when a document is changed or deleted, when that index is itself generated by running map/reduce over another index? I tell myself that the bookkeeping is possible, but it sure sounds like a big job. > to me the membership (tag relationship, follower relationship, whatever) > is a discrete piece of data and should have its own document. Using remapping, you could have the membership document ({user:user_id, tag:tag, photo:photo_id}), and still get to the goal, which is a view that has photos sorted by tag, so that with ?key="tag" you could load all the photos with a given tag. (A user or photo's tagcloud can come from a view directly on the tagging document.) I have a prototype of remapping (with no cache-awareness) in CouchRest's git repo http://github.com/jchris/couchrest/tree/master/utils/remap.rb We use it at Grabb.it to build join indexes for doing quick lookups. The downside is that the index (stored in a separate logical database) has to be regenerated on the addition of new records, because it doesn't track which documents contributed to a given key. You're making sense, but I also wouldn't mind code examples :) -- Chris Anderson http://jchris.mfdz.com