couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajkumar S <rajkum...@gmail.com>
Subject Re: Slow Map Reduce
Date Thu, 22 Oct 2009 06:37:37 GMT
Hi,

Total number of docs are 341948. growing at the rate of about a couple
per second. I know that first time indexing takes some time but I have
not yet been able to complete even a single map so far. I am now
trying with a simpler map with no reduce.

function(doc) {
    if ("msgtype" in doc){
        if (doc.msgtype == "allow"){
            if ((doc.event == "action_allowed_ip") || (doc.event ==
"action_allow_new")){
                emit (ip, 1);
            }
        }
    }
}



On Wed, Oct 21, 2009 at 8:25 PM, Alex P <apedenko@kolosy.com> wrote:
> how many docs is that, and have you run the view incrementally? first time
> index builds are painful...
>
> On Wed, Oct 21, 2009 at 9:46 AM, Rajkumar S <rajkumars@gmail.com> wrote:
>
>> Hello,
>>
>> I am using couch db 0.9.0  for storing logs from my mail server.
>>
>> Logs are sent from mail servers to a RabbitMQ queue server. Log
>> insertions into couchdb is done by a python program, after fetching it
>> from RabbitMQ and converting to Json, using couchdb  module (from
>> couchdb import *). I have a single document storing entire history of
>> the email transactions. I also have multiple RabbitMQ clients each
>> pulling from same queue and updating the same coudhdb. This means I
>> have to update the same document from different clients several times
>> during the life time of an email message.
>>
>> To do this I use the message id of each mail transaction as it's key.
>> (this appears in every log entry) When a first log entry arrives I
>> check if a doc with that key is present in db, if not I create a new
>> doc with that key. When second log arrives I extract the doc, convert
>> it to a hash table in my program, merge the new log entry with the
>> hash table and update the doc with the updated hash table's json. If a
>> conflict occurs, the program retries, fetching the doc and updating it
>> and storing again till conflict is resolved.
>>
>> This means for every write there is a corresponding read.
>>
>> Currently I am running it as a pilot and just have a single server
>> logging to couchdb. I have about 0.75 GB per day right now, with
>> GET/PUT happening almost continuously (say 1 - 2 per second).
>> Previously I had a test server running and I tested couple of map
>> reduce using that DB (about 5 mb)
>>
>> Now after logging from a single production machine I am not able to
>> run a single view so far. I get the following error if I wait long
>> enough:
>>
>> Error: case_clause
>>
>> {{bad_return_value,{os_process_error,"OS process timed out."}},
>>  {gen_server,call,
>>             [<0.436.0>,
>>              {prompt,[<<"rereduce">>,
>>                       [<<"function(keys, values)\n{\n    return
>> values;\n}">>],.....
>>
>> I have changed os_process_timeout to 50000, removed the reduce part
>> but even after about 6 hours my map is not yet finished. Currently the
>> db size is 3.6G
>>
>> The map function I am using is:
>>
>> function(doc) {
>>    if ("msgtype" in doc){
>>        if (doc.msgtype == "allow"){
>>            if ((doc.event == "action_allowed_ip") || (doc.event ==
>> "action_allow_new")){
>>                result = {};
>>                ip = doc.parameters.client_address;
>>                result["helo"] = doc.parameters.helo_name;
>>                result["event"] = doc.event;
>>                result["timestamp"] = doc.timestamp;
>>                result["id"] = doc._id;
>>                result["from"] = doc.parameters.sender;
>>                result["to"] = doc.parameters.recipient;
>>                emit (ip,result);
>>            }
>>        }
>>    }
>> }
>>
>> Top shows that couchjs is most active process and it shows the
>> following line right now,
>> 11410 root      20   0 90752  27m  752 R   76  0.7   1235:05 couchjs
>>
>> My hardware is  Intel(R) Core(TM)2 Duo CPU E6750  @ 2.66GHz, 4Gig RAM
>> and one SATA hard disk. I do not think this is the expected
>> performance of couchdb, so is there some thing I am doing wrong? Any
>> tips to enhance the performance to acceptable levels?
>>
>> thanks and much regards,
>>
>> raj
>>
>

Mime
View raw message