incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Troy Kruthoff <>
Subject Re: when to use another document and when not to?
Date Mon, 21 Jul 2008 16:03:58 GMT

On Jul 21, 2008, at 6:07 AM, Bradford Winfrey wrote:

> I know I had lobbied for multi-key GET's before, what I've done in  
> the meantime is basically the n+1 problem.  My only gripe with it,  
> is that I have to keep appending documents into a hash as I could  
> load the same document twice but, since I'm using the doc._id as the  
> key of the hash I just let it overwrite whatever's there, then sort  
> the values accordingly.  It doesn't seem to be too bad performance  
> wise, but there's only a trickle of traffic to my server (and even  
> that is only running 256Mb of RAM).  It still seems to do the trick  
> for the time being, if I knew more Erlang I'd try to patch it up -  
> heck, even though I don't I still want to give it another shot.

Big ditto here, but I wanted to add that we implemented a runtime  
identity map and it is working pretty well... e.g. if you abstract all  
the GET requests to couch, you can check to see if your hash already  
has the id to at least eliminate calling couch twice for the same in a single session.  As soon as we finish our proof of  
concept, we plan to introduce ourselves to erlang and get couch doing  
multi-gets, which we view as a "performance optimization".  I've seen  
this brought up a few times so I have to believe somebody is going to  
get it done sooner or later.

> ----- Original Message ----
> From: Sho Fukamachi <>
> To:
> Sent: Monday, July 21, 2008 5:56:51 AM
> Subject: Re: when to use another document and when not to?
> On 21/07/2008, at 7:32 PM, Andrew Richards wrote:
>> 1. Put all of the user's data data into the subscription document at
>> the time it's written. Then, when you get the subscription documents,
>> the necessary data will be right there.
> Yes - and the advantage of this method is that you can also write
> "options" in there (such as the aforementioned "email on update" etc,
> since we all seem to be discussing hypothetical Twitter clones!). I
> agree data storage space is not really an issue we should care about
> much these days, especially with small things like user account info.
>> 2. Or, just do a lot of GETs for all of the data you need. It  
>> actually
>> works. (Or even better, get them from something like memcached).
> Hm, I would lean away from any n+1 situation like this. Memcached is
> fast but it's not unlimited .. it's much more elegant to be able to
> get it all at once if at all possible ..
>> While the first one does indeed introduce replicated data, it will
>> yield very nice performance benefits (this is in line with how  
>> CouchDB
>> works as a whole.) How often will users change their full names?
>> Enough so that you can't go back and rewrite their subscription
>> documents?
> Agreed on all points.
>> The second one is faster than you think. How often will you need to
>> get the names of more than, say, 50 subscribers at a time? Does the
>> user viewing this data really need to be able to see all of this data
>> on the same page? Even with a lot of documents, pulling from CouchDB
>> is very fast. Memcached much more so.
> There's still unavoidable HTTP overhead - I doubt executing a loop and
> pulling 100 (or worse, 1000) documents consecutively from any source
> that's not main memory on the local machine would be good enough for
> Amazon's 300ms rule of thumb (a good one, IMO). But it is an option.
>> 3. Big joins like this are what make relational databases slow.
> Oh, no doubt about it. I actually excised all joins from an RDBMS app
> about 6 months ago for this very reason, preferring to just pull the
> data in 2 or 3 stages. The difference is that I could do multi-key
>> 4. CouchDB is not a replacement for relational databases.
> Definitely, and none of this curbs my enthusiasm for it. All of these
> nuances can be designed around, possibly using some of the excellent
> suggestions you've given. Personally the ease of replication of
> CouchDB databases is such a draw that it renders this discussion
> practically moot for me anyway - I'm going to use it, JOINs or not,
> replication is that important to me.
> However - everyone's been bandying around "couchdb supports JOINs!"
> and I wanted to either find out what I was missing, or get it settled
> one way or the other : )
> thanks again.
> Sho
>> On Mon, Jul 21, 2008 at 12:29 AM, Sho Fukamachi <
>>> wrote:
>>> On 21/07/2008, at 1:56 PM, Dean Landolt wrote:
>>>>> Or, obviously, I would be delighted if someone could show me how
>>>>> I'm
>>>>> completely wrong and it is actually possible to do this : )
>>>> You can. Complex keys. I put together a little test:
>>> Firstly, I appreciate the effort you put into your reply. Great to
>>> be able
>>> to see your solution in action there. And I hope you don't mind I
>>> replicated
>>> it so I could examine it locally : )
>>>> The map function just uses a two-level key...
>>>> function(doc) {
>>>> if (doc.type == 'user') {
>>>> emit([doc.username,0], doc)
>>>> } else if (doc.type == 'subscription') {
>>>> emit([doc.follower, doc.followee], doc)
>>>> }
>>>> }
>>> But all that key is doing is sorting the results, right?
>>>> Read this for more details:
>>> Believe me I've read that about 10 times. I still can't see how to
>>> solve the
>>> problem.
>>>> But yes, you can do joins. You can query this view for just one  
>>>> user
>>>> simply:
>>>> [%22dlandolt%22]&endkey=[%22dlandolt%22,{}]
>>> Again I think I am not making myself clear. If you look at that
>>> view, you
>>> see it returns 3 rows.
>>> The first row is the user whose name you have searched upon. In
>>> this case
>>> your own.
>>> The next two rows are the *subscription* documents, which is not
>>> what I am
>>> talking about. It is easy to get the subscription documents and
>>> followee
>>> user document for any given followee username. If you abandon
>>> sorting all
>>> you need is:
>>> function(doc) {
>>>   if (doc.type == 'user') {
>>>     emit(doc.username, doc)
>>>   } else if (doc.type == 'subscription') {
>>>     emit(doc.followee, doc)
>>>   }
>>> }
>>> And you get the exact same results, sans the sorting.
>>> This is not the kind of join I meant - in fact this is not a join
>>> at all, as
>>> I understand them. A proper join would get you the follower *user*
>>> documents
>>> - not just the subscriptions. As it stands if you wanted, say, the
>>> full
>>> names of the follower users, you are then faced with an n+1 query
>>> to look
>>> them up one by one. And same in reverse, if you wanted the full user
>>> documents of all followed users, starting with the follower's
>>> username.
>>> Making things worse is that CouchDB doesn't currently have the
>>> ability to do
>>> multi-key gets (see previous ML discussion).
>>> In other words, with a proper join, starting from the username
>>> 'katz', you
>>> could get back the *user* documents from both following users "sho"
>>> and
>>> "dlandolt". The user documents, *not* the subscription docs.
>>> This is a bit difficult to discuss without ambiguity (or resorting
>>> to SQL
>>> queries) so let me put it in terms of a use case "challenge"
>>> question: with
>>> that current DB, can you write a query that, starting with the
>>> username
>>> "katz",  outputs the *names* (not usernames) of all users following
>>> him?
>>> *That* is a join query and that's what I can't see how to do in
>>> CouchDB.
>>> Many thanks for the discussion and sorry, again, for not explaining
>>> myself
>>> properly the first n times... my sincere apologies if my stubborn
>>> ignorance
>>> is annoying everyone here!
>>> Sho
>>>> Notice the {} in there -- from what I gather objects are at the
>>>> bottom of
>>>> the sort, so this query gets all data related to user dlandolt --
>>>> the
>>>> first
>>>> result (or any result with the second part of a key ending in 0
>>>> based on
>>>> how
>>>> I wrote my view), and then everything following are the
>>>> *subscription*
>>>> docs
>>>> that Damien recommended.
>>>> Hope that helps.

View raw message