Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 54513 invoked from network); 21 Jul 2008 09:33:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Jul 2008 09:33:07 -0000 Received: (qmail 7339 invoked by uid 500); 21 Jul 2008 09:33:06 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 7311 invoked by uid 500); 21 Jul 2008 09:33:06 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 7300 invoked by uid 99); 21 Jul 2008 09:33:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jul 2008 02:33:06 -0700 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of randrew@gmail.com designates 74.125.46.156 as permitted sender) Received: from [74.125.46.156] (HELO yw-out-1718.google.com) (74.125.46.156) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jul 2008 09:32:10 +0000 Received: by yw-out-1718.google.com with SMTP id 5so498476ywr.0 for ; Mon, 21 Jul 2008 02:32:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=ibOHv6wOf7DjLrzVO0VhJn7SpQOsTS9MtFjGPCfQxt4=; b=hneEZpXfCmek8oOzONUhIfplXBjBF5C6kAtMznMHYyLdlebdjA26lWNFSmNtZjdDRh 3PpElxHfqoxsxog8co6a+fRSLcL4gg9f8MCQTQizi1smQMmD7d2rHyu2CN6NH6cBVmFN uB2Ow1f9yG9aTPynWs6MdLCFIlMJzyyiBei9I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=ZZjNylTvdCWWVAHJ1NTUsTJ1DGl+HmC/NqPCygndk6FxhfeCP0eisumtokMnFxPT3r QLQgqcpV3JlFRtswgUbkbd5PkjdDzdf54zdjnATnySoptc2zBsCFUAp3URg4QkjEM3r8 xvlTve/JTpeYxLRazO8xyu1f0jwVLhiRPCvhg= Received: by 10.150.218.10 with SMTP id q10mr3567074ybg.55.1216632754358; Mon, 21 Jul 2008 02:32:34 -0700 (PDT) Received: by 10.150.146.12 with HTTP; Mon, 21 Jul 2008 02:32:34 -0700 (PDT) Message-ID: <69c941900807210232r55c29c9epb766b80b6c07fbc0@mail.gmail.com> Date: Mon, 21 Jul 2008 02:32:34 -0700 From: "Andrew Richards" To: couchdb-user@incubator.apache.org Subject: Re: when to use another document and when not to? In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <6b6419750807141546q63431cfek69184295a1cb25b0@mail.gmail.com> <4E5620A8-7ACD-4ADB-8FFD-424EEA289E21@apache.org> <6b6419750807141654x5b548706i1290367ca84acd27@mail.gmail.com> <66471514-BF70-4EFC-A258-CEC5C4799924@gmail.com> <6b6419750807201227j22d34275pe5afcaf96d8f2a5a@mail.gmail.com> <64a10fff0807202056w3a9df1dbo49858fcf2faa82fd@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org I will offer you four points of advice which I have gleaned over the last few weeks while working with CouchDB. The first two are possible "solutions" to your problem, though they are not exactly what you are looking for. 1. Put all of the user's data data into the subscription document at the time it's written. Then, when you get the subscription documents, the necessary data will be right there. 2. Or, just do a lot of GETs for all of the data you need. It actually works. (Or even better, get them from something like memcached). While the first one does indeed introduce replicated data, it will yield very nice performance benefits (this is in line with how CouchDB works as a whole.) How often will users change their full names? Enough so that you can't go back and rewrite their subscription documents? The second one is faster than you think. How often will you need to get the names of more than, say, 50 subscribers at a time? Does the user viewing this data really need to be able to see all of this data on the same page? Even with a lot of documents, pulling from CouchDB is very fast. Memcached much more so. My second two points: 3. Big joins like this are what make relational databases slow. 4. CouchDB is not a replacement for relational databases. On Mon, Jul 21, 2008 at 12:29 AM, Sho Fukamachi wrote: > > On 21/07/2008, at 1:56 PM, Dean Landolt wrote: > >>> Or, obviously, I would be delighted if someone could show me how I'm >>> completely wrong and it is actually possible to do this : ) >> >> >> You can. Complex keys. I put together a little test: >> >> http://dev.deanlandolt.com:5984/test/_view/subscriptions/users > > Firstly, I appreciate the effort you put into your reply. Great to be able > to see your solution in action there. And I hope you don't mind I replicated > it so I could examine it locally : ) > >> The map function just uses a two-level key... >> >> function(doc) { >> if (doc.type == 'user') { >> emit([doc.username,0], doc) >> } else if (doc.type == 'subscription') { >> emit([doc.follower, doc.followee], doc) >> } >> } > > But all that key is doing is sorting the results, right? > >> Read this for more details: >> http://www.cmlenz.net/archives/2007/10/couchdb-joins > > Believe me I've read that about 10 times. I still can't see how to solve the > problem. > >> But yes, you can do joins. You can query this view for just one user >> simply: >> >> >> http://dev.deanlandolt.com:5984/test/_view/subscriptions/users?startkey=[%22dlandolt%22]&endkey=[%22dlandolt%22,{}] > > Again I think I am not making myself clear. If you look at that view, you > see it returns 3 rows. > > The first row is the user whose name you have searched upon. In this case > your own. > > The next two rows are the *subscription* documents, which is not what I am > talking about. It is easy to get the subscription documents and followee > user document for any given followee username. If you abandon sorting all > you need is: > > function(doc) { > if (doc.type == 'user') { > emit(doc.username, doc) > } else if (doc.type == 'subscription') { > emit(doc.followee, doc) > } > } > > And you get the exact same results, sans the sorting. > > This is not the kind of join I meant - in fact this is not a join at all, as > I understand them. A proper join would get you the follower *user* documents > - not just the subscriptions. As it stands if you wanted, say, the full > names of the follower users, you are then faced with an n+1 query to look > them up one by one. And same in reverse, if you wanted the full user > documents of all followed users, starting with the follower's username. > Making things worse is that CouchDB doesn't currently have the ability to do > multi-key gets (see previous ML discussion). > > In other words, with a proper join, starting from the username 'katz', you > could get back the *user* documents from both following users "sho" and > "dlandolt". The user documents, *not* the subscription docs. > > This is a bit difficult to discuss without ambiguity (or resorting to SQL > queries) so let me put it in terms of a use case "challenge" question: with > that current DB, can you write a query that, starting with the username > "katz", outputs the *names* (not usernames) of all users following him? > > *That* is a join query and that's what I can't see how to do in CouchDB. > > Many thanks for the discussion and sorry, again, for not explaining myself > properly the first n times... my sincere apologies if my stubborn ignorance > is annoying everyone here! > > Sho > > > >> Notice the {} in there -- from what I gather objects are at the bottom of >> the sort, so this query gets all data related to user dlandolt -- the >> first >> result (or any result with the second part of a key ending in 0 based on >> how >> I wrote my view), and then everything following are the *subscription* >> docs >> that Damien recommended. >> >> Hope that helps. > >