incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex P <apede...@kolosy.com>
Subject Re: Slow Map Reduce
Date Wed, 21 Oct 2009 14:55:55 GMT
how many docs is that, and have you run the view incrementally? first time
index builds are painful...

On Wed, Oct 21, 2009 at 9:46 AM, Rajkumar S <rajkumars@gmail.com> wrote:

> Hello,
>
> I am using couch db 0.9.0  for storing logs from my mail server.
>
> Logs are sent from mail servers to a RabbitMQ queue server. Log
> insertions into couchdb is done by a python program, after fetching it
> from RabbitMQ and converting to Json, using couchdb  module (from
> couchdb import *). I have a single document storing entire history of
> the email transactions. I also have multiple RabbitMQ clients each
> pulling from same queue and updating the same coudhdb. This means I
> have to update the same document from different clients several times
> during the life time of an email message.
>
> To do this I use the message id of each mail transaction as it's key.
> (this appears in every log entry) When a first log entry arrives I
> check if a doc with that key is present in db, if not I create a new
> doc with that key. When second log arrives I extract the doc, convert
> it to a hash table in my program, merge the new log entry with the
> hash table and update the doc with the updated hash table's json. If a
> conflict occurs, the program retries, fetching the doc and updating it
> and storing again till conflict is resolved.
>
> This means for every write there is a corresponding read.
>
> Currently I am running it as a pilot and just have a single server
> logging to couchdb. I have about 0.75 GB per day right now, with
> GET/PUT happening almost continuously (say 1 - 2 per second).
> Previously I had a test server running and I tested couple of map
> reduce using that DB (about 5 mb)
>
> Now after logging from a single production machine I am not able to
> run a single view so far. I get the following error if I wait long
> enough:
>
> Error: case_clause
>
> {{bad_return_value,{os_process_error,"OS process timed out."}},
>  {gen_server,call,
>             [<0.436.0>,
>              {prompt,[<<"rereduce">>,
>                       [<<"function(keys, values)\n{\n    return
> values;\n}">>],.....
>
> I have changed os_process_timeout to 50000, removed the reduce part
> but even after about 6 hours my map is not yet finished. Currently the
> db size is 3.6G
>
> The map function I am using is:
>
> function(doc) {
>    if ("msgtype" in doc){
>        if (doc.msgtype == "allow"){
>            if ((doc.event == "action_allowed_ip") || (doc.event ==
> "action_allow_new")){
>                result = {};
>                ip = doc.parameters.client_address;
>                result["helo"] = doc.parameters.helo_name;
>                result["event"] = doc.event;
>                result["timestamp"] = doc.timestamp;
>                result["id"] = doc._id;
>                result["from"] = doc.parameters.sender;
>                result["to"] = doc.parameters.recipient;
>                emit (ip,result);
>            }
>        }
>    }
> }
>
> Top shows that couchjs is most active process and it shows the
> following line right now,
> 11410 root      20   0 90752  27m  752 R   76  0.7   1235:05 couchjs
>
> My hardware is  Intel(R) Core(TM)2 Duo CPU E6750  @ 2.66GHz, 4Gig RAM
> and one SATA hard disk. I do not think this is the expected
> performance of couchdb, so is there some thing I am doing wrong? Any
> tips to enhance the performance to acceptable levels?
>
> thanks and much regards,
>
> raj
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message