incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Possible bug in indexer... (really)
Date Sat, 04 Jul 2009 00:35:45 GMT
On Jul 3, 2009, at 6:37 PM, Chris Anderson wrote:

> 2009/7/3 Göran Krampe <goran@krampe.se>:
>> Hi folks!
>>
>> We are writing an app using CouchDB where we tried to do some map/ 
>> reduce to
>> calculate "period sums" for about 1000 different "accounts". This  
>> is fiscal
>> data btw, the system is meant to store detailed fiscal data for  
>> about 50000
>> companies, for starters. :)
>>
>> The map function is trivial, it just emits a bunch of "accountNo,  
>> amount"
>> pairs with "month" as key.
>>
>> The reduce/rereduce take these and builds a dictionary (JSON  
>> object) with
>> "month-accountNo" as key (like "2009/10-2335" and the sum as the  
>> value. This
>> works fine, yes, it builds up a bit but there is a maximum of account
>> numbers and months so it doesn't grow out of control, so that is  
>> NOT the
>> issue.
>
> There is *no reason ever* to build up a dictionary with more then a
> small handful of items in it. Eg it's ok if your dictionary has this
> fixed set of keys: count, total, stddev, avg.
>
> It's not OK to do what you are doing. This is what group_level is for.
> Rewrite your map reduce to be correct and then we can start talking
> about performance.
>
> I don't mean to be harsh but suggesting you have a performance problem
> here is like me complaining that my Ferrari makes a bad boat.
>
> Cheers,
> Chris

Wow, that was unusually harsh coming from you, Chris.  Taking a closer  
look at Göran's map and reduce functions I agree that they should be  
reworked to make use of group=true, but nevertheless I wonder if we do  
have something to work on here.

I believe Göran's problem was that the second pass was causing the  
view updater process to use a significant amount of memory and trigger  
should_flush() immediately.  As a result, view KVs were being written  
to disk after every document (triggering the reduce/rereduce step).   
This is fantastically inefficient.  If the penalty for flushing  
results to disk during indexing is so severe, perhaps we want to be a  
little more flexible in imposing it.  There could be very legitimate  
cases where users with large documents and/or sophisticated workflows  
are hung out to dry during indexing because the view updater wants a  
measly 11MB of memory to do its job.

Adam

>> Ok, here comes the punchline. When we dump the first 1000 docs  
>> using bulk,
>> which typically will amount to say 5000 emits - and we "touch" the  
>> view to
>> trigger it - it will be rather fast and behaves like this:
>>
>> - a single Erlang process runs and emits all values, then it does a  
>> bunch or
>> reduce on those values and finally it switches into rereduce mode  
>> and does
>> those and then you can see the dictionary "growing" a bit but never  
>> too
>> much. It is pretty fast, a second or two all in all.
>>
>> Fine. Them we dump the *next* 1000 docs into Couch and triggers the  
>> view
>> again. This time it behaves like this (believe it or not):
>>
>> - two Erlang processes get into play. It seems the same process as  
>> above
>> continues with emits (IIRC) but a second one starts doing reduce/ 
>> rereduce
>> *while the first one is emitting*.

This is actually by design.

>> Ouch. And to make it worse - the second one seems to gradually  
>> "take over" until we only see 2-3 emits followed by tons of  
>> rereduces (all the way up I guess for each emit).

This is not.

>> Sooo... evidently Couch decides to do stuff in parallell and starts  
>> doing
>> reduce/rereduce while emitting here. AFAIK this is not the behavior
>> described.

Not sure if it's described, but it is by design.  The reduce function  
executes when the btree is modified.  We can't afford to cache KVs  
from an index update in memory regardless of size; we have to set some  
threshold when we flush them to disk.

I think the fundamental question is why the flush operations were  
occurring so frequently the second time around.  Is it because you  
were building up a largish hash for the reduce value?  Probably.   
Nevertheless, I'd like to have a better handle on that.

Adam

>> The net effect is that the view update that took 1-2 seconds
>> suddenly takes 400 seconds or goes to a total crawl and never seems  
>> to end.
>>
>> By looking at the log it obviously processes ONE doc at a time -  
>> giving us
>> 2-5 emits typically and then tries to reduce that all the way up to  
>> the root
>> before processing the next doc. So the rereduces for the internal  
>> nodes will
>> be run typically in this case 1000x more than needed.
>>
>> Phew. :) Ok, so we are basically hosed with this behavior in this  
>> situation.
>> I can only presume this has gone unnoticed because:
>>
>> a) Updates most of us do are small. But we dump thousands of new  
>> docs using
>> bulk (a full new fiscal year of data for a given company) so we  
>> definitely
>> notice it.
>>
>> b) Most reduce/rereduce functions are very, very fast. So it goes  
>> unnoticed.
>> Our functions are NOT that fast - but if they were only run as they  
>> should
>> (well, presuming they *should* only be run after all the emits for  
>> all doc
>> changes in a given view update) it would indeed be fast anyway. We  
>> can see
>> that since the first 1000 docs work fine.
>>
>> ...and thanks to the people on #couchdb for discussing this with me  
>> earlier
>> today and looking at the Erlang code to try to figure it out. I  
>> think Adam
>> Kocolski and Robert Newson had some idea about it.
>>
>> regards, Göran
>>
>> PS. I am on vacation now for 4 weeks, so I will not be answering  
>> much email.
>> I wanted to get this posted though since it is in some sense a  
>> rather ...
>> serious performance bottleneck.
>
>
>
> -- 
> Chris Anderson
> http://jchrisa.net
> http://couch.io


Mime
View raw message