incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Kalmer <kenneth.kal...@gmail.com>
Subject Some guidance with extremely slow indexing
Date Thu, 09 Apr 2009 14:25:44 GMT
Hi everyone

After months of lurking and reading up on couch I finally got the time to
start using it for an internal mail log analyzer. I parse the logs from our
Courier-IMAP installation and convert the different lines into documents and
this has proven to work quite well.

My first task is to extract some metrics from these docs regarding how
oftern people "pop" their mail, and the returned sizes of each "pop".
Documents in question look like this:

{
   "_id": "0000f68e73f3521f3ee8b3b51e0101d7",
   "_rev": "1-3732031452",
   "user": "user@example.com",
   "host": "pop-5",
   "time": "2009/03/13 05:47:08 +0000",
   "action": "LOGOUT",
   "service": "pop3d",
   "ip": "[10.0.0.1]",
   "top": "0",
   "retr": "0"
}

I've got one design document, with 4 views in. All of them have reduce steps
as well. I've placed all the code in a Gist to keep the mail clean:
http://gist.github.com/92476

Basically I get the following from the different views:

* days - Days and number of activities, used as a key lookup for...
* daily - Total aggregate usage for each user on the day
* months & monthly work the same as the above, except over months

Updating the indexes are incredibly slow, and I have no idea where to begin
looking. I suspect my maps are "expensive", but since this is my first shot
I'll keep quiet and listen to any advice. With "slow" I mean that on my
local development VM (gentoo, couch 0.9, erlang R12B-5, js 1.7) processing a
150,000 docs is closing in on 24 hours... On a production site I have
3,300,000 docs and over about 18 hours it has only indexed 264,091 documents
(7%). I built the views using only a couple of hundred docs, probably less
than 1,000, and didn't expect this to happen...

>From reading other posts in the archives I know the initial index can take a
while, but somehow this just seems a bit ridiculous.

Any advice would be greatly appreciated.

Thanks in advance, and thanks for the awesome tool you guys have built.

Best

-- 
Kenneth Kalmer
kenneth.kalmer@gmail.com
http://opensourcery.co.za

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message