incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Kalmer <kenneth.kal...@gmail.com>
Subject Re: Some guidance with extremely slow indexing
Date Sun, 12 Apr 2009 16:51:40 GMT
On Sun, Apr 12, 2009 at 1:11 AM, Chris Anderson <jchris@apache.org> wrote:

> On Sat, Apr 11, 2009 at 12:06 PM, Paul Davis
> <paul.joseph.davis@gmail.com> wrote:
> > On Sat, Apr 11, 2009 at 2:58 PM, Kenneth Kalmer
> > <kenneth.kalmer@gmail.com> wrote:
> >> On Thu, Apr 9, 2009 at 5:17 PM, Paul Davis <paul.joseph.davis@gmail.com
> >wrote:
> >>
> >>> Kenneth,
> >>>
> >>> I'm pretty sure you're issue is in the reduce steps for the daily and
> >>> montly views. The general rule of thumb is that you shouldn't be
> >>> returning data that grows faster than log(#keys processed) where as I
> >>> believe your data is growing linearly with input.
> >>>
> >>> This particular limitation is a result of the implementation of
> >>> incremental reductions. Basically, each key/pointer pair stores the
> >>> re-reduced value for all [re-]reduce values in its children nodes. So
> >>> as your reduction moves up the tree the data starts exploding which
> >>> kills btree performance not to mention the extra file I/O.
> >>>
> >>> The basic moral of the story is that if you want reduce views like
> >>> this per user you should emit a [user_id, date] pair as the key and
> >>> then call your reduce views with group=true.
> >>>
> >>> HTH,
> >>> Paul Davis
> >>>
> >>
> >> Hi Paul
> >>
> >> Thanks for taking the trouble of investigating for me, I'll dive into
> the
> >> views and clean them up a bit according to your advice as well as brush
> up
> >> on the caveat you explained. I saw other threads in the archives where
> you
> >> gave similar advice, sorry for not realizing I stepped into the same
> trap.
> >> When I've got the issue resolved I'll update the gist and we can leave
> it as
> >> a point of reference for others.
> >>
> >> Thanks again!
> >>
> >
> > Its kind of a hard one to notice right away as its not an error, it
> > just kills performance. Perhaps Damien was right in that we should
> > think about adding log vomiting when we detect that there's a crap
> > load of data accumulating in the reductions.
> >
>
> I agree -- maybe another config setting
> max_intermediate_reduction_size or something. So that you can raise it
> if you really know what you are doing. Unless there are hard-limits,
> in which case we should just error properly when we reach them.
>

Hi Paul & Chris

This would help, I'm sure a lot of people would be caught in this trap
initially.

I've cleaned up my views a bit and the are much more performant now. On our
"production" couch where there is currently 6.6 million docs now the
indexing has been running now for close to 18 hours and is 80% done. I
killed the previous indexing task, since after 5 days it was only
50-something percent done with 3.1 million docs at the time it started.

After going through the docs carefully again and clearly thinking through my
problem, as well as taking the "emit([key, doc.user])" advice from Paul more
seriously I got it working. The docs gives the warning, without any real
references, making it sound like a "yeah whatever" kinda thing. This is
dangerous. However the realm gem lies in a line I picked up somewhere in the
wiki, it stresses that the reduce views should build a summary, not
aggregate data, which was my mistake. I now aggregate the data in my own app
with two extra lines of code and the views now become very powerful using
group_level. So my old 'days' and 'daily' views are now combined in a
single, more useful, 'daily' view.

I'll update the gist as soon as my DSL is fixed at home and blog on my
learning curve as well, as soon as I can conjure up a nice example for
rereduce, which I also only figured out through this excercise.

Thanks again for helping the newbies, the willingness of everyone here to
assist definitely helps drive couch adoption.

Best

-- 
Kenneth Kalmer
kenneth.kalmer@gmail.com
http://opensourcery.co.za

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message