couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Some guidance with extremely slow indexing
Date Sat, 11 Apr 2009 23:11:05 GMT
On Sat, Apr 11, 2009 at 12:06 PM, Paul Davis
<paul.joseph.davis@gmail.com> wrote:
> On Sat, Apr 11, 2009 at 2:58 PM, Kenneth Kalmer
> <kenneth.kalmer@gmail.com> wrote:
>> On Thu, Apr 9, 2009 at 5:17 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:
>>
>>> Kenneth,
>>>
>>> I'm pretty sure you're issue is in the reduce steps for the daily and
>>> montly views. The general rule of thumb is that you shouldn't be
>>> returning data that grows faster than log(#keys processed) where as I
>>> believe your data is growing linearly with input.
>>>
>>> This particular limitation is a result of the implementation of
>>> incremental reductions. Basically, each key/pointer pair stores the
>>> re-reduced value for all [re-]reduce values in its children nodes. So
>>> as your reduction moves up the tree the data starts exploding which
>>> kills btree performance not to mention the extra file I/O.
>>>
>>> The basic moral of the story is that if you want reduce views like
>>> this per user you should emit a [user_id, date] pair as the key and
>>> then call your reduce views with group=true.
>>>
>>> HTH,
>>> Paul Davis
>>>
>>
>> Hi Paul
>>
>> Thanks for taking the trouble of investigating for me, I'll dive into the
>> views and clean them up a bit according to your advice as well as brush up
>> on the caveat you explained. I saw other threads in the archives where you
>> gave similar advice, sorry for not realizing I stepped into the same trap.
>> When I've got the issue resolved I'll update the gist and we can leave it as
>> a point of reference for others.
>>
>> Thanks again!
>>
>
> Its kind of a hard one to notice right away as its not an error, it
> just kills performance. Perhaps Damien was right in that we should
> think about adding log vomiting when we detect that there's a crap
> load of data accumulating in the reductions.
>

I agree -- maybe another config setting
max_intermediate_reduction_size or something. So that you can raise it
if you really know what you are doing. Unless there are hard-limits,
in which case we should just error properly when we reach them.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message