couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajkumar S <rajkum...@gmail.com>
Subject Slow Map Reduce
Date Wed, 21 Oct 2009 14:46:55 GMT
Hello,

I am using couch db 0.9.0  for storing logs from my mail server.

Logs are sent from mail servers to a RabbitMQ queue server. Log
insertions into couchdb is done by a python program, after fetching it
from RabbitMQ and converting to Json, using couchdb  module (from
couchdb import *). I have a single document storing entire history of
the email transactions. I also have multiple RabbitMQ clients each
pulling from same queue and updating the same coudhdb. This means I
have to update the same document from different clients several times
during the life time of an email message.

To do this I use the message id of each mail transaction as it's key.
(this appears in every log entry) When a first log entry arrives I
check if a doc with that key is present in db, if not I create a new
doc with that key. When second log arrives I extract the doc, convert
it to a hash table in my program, merge the new log entry with the
hash table and update the doc with the updated hash table's json. If a
conflict occurs, the program retries, fetching the doc and updating it
and storing again till conflict is resolved.

This means for every write there is a corresponding read.

Currently I am running it as a pilot and just have a single server
logging to couchdb. I have about 0.75 GB per day right now, with
GET/PUT happening almost continuously (say 1 - 2 per second).
Previously I had a test server running and I tested couple of map
reduce using that DB (about 5 mb)

Now after logging from a single production machine I am not able to
run a single view so far. I get the following error if I wait long
enough:

Error: case_clause

{{bad_return_value,{os_process_error,"OS process timed out."}},
 {gen_server,call,
             [<0.436.0>,
              {prompt,[<<"rereduce">>,
                       [<<"function(keys, values)\n{\n    return
values;\n}">>],.....

I have changed os_process_timeout to 50000, removed the reduce part
but even after about 6 hours my map is not yet finished. Currently the
db size is 3.6G

The map function I am using is:

function(doc) {
    if ("msgtype" in doc){
        if (doc.msgtype == "allow"){
            if ((doc.event == "action_allowed_ip") || (doc.event ==
"action_allow_new")){
                result = {};
                ip = doc.parameters.client_address;
                result["helo"] = doc.parameters.helo_name;
                result["event"] = doc.event;
                result["timestamp"] = doc.timestamp;
                result["id"] = doc._id;
                result["from"] = doc.parameters.sender;
                result["to"] = doc.parameters.recipient;
                emit (ip,result);
            }
        }
    }
}

Top shows that couchjs is most active process and it shows the
following line right now,
11410 root      20   0 90752  27m  752 R   76  0.7   1235:05 couchjs

My hardware is  Intel(R) Core(TM)2 Duo CPU E6750  @ 2.66GHz, 4Gig RAM
and one SATA hard disk. I do not think this is the expected
performance of couchdb, so is there some thing I am doing wrong? Any
tips to enhance the performance to acceptable levels?

thanks and much regards,

raj

Mime
View raw message