couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommy Chheng <tommy.chh...@gmail.com>
Subject Re: CouchDB pegging the CPU and not responding to requests
Date Tue, 01 Sep 2009 18:32:22 GMT
Hey John,
I'm encountering a similar problem where the server or client can no  
longer make the connection. My strongest theory is that my rails  
client using couchrest-0.33 is not giving up the file descriptors and  
too many files are being opened on the linux machine.

I am using a smaller dataset(400K docs, DB is roughly 900 MB) but with  
an intensive word counting/doc similarity view.

I haven't really had time to take a deeper look yet so it's not solved  
for me yet either.
As a workaround, i just used couchdb as a storage bin and using pig/ 
hadoop to make my necessary computations.

Take a look at the "couchdb server connection refused error" thread on  
this mailing list. It might be of some help.

http://mail-archives.apache.org/mod_mbox/couchdb-user/200908.mbox/%3C6D51F666-4324-4745-AC73-A9BEDF7BE8EC@gmail.com%3E


Tommy

El Sep 1, 2009, a las 11:19 AM, John Wood escribió:

> Thanks for the reply Chris.
>
> I'll look into upgrading our test environment to the trunk version of
> CouchDB, and see if I can reproduce the error there.
>
> We're using CouchRest version 0.33 as the client library.
>
> Thanks again,
> John
>
> On Tue, Sep 1, 2009 at 12:49 PM, Chris Anderson <jchris@apache.org>  
> wrote:
>
>> On Tue, Sep 1, 2009 at 7:52 AM, John  
>> Wood<john@interactivemediums.com>
>> wrote:
>>> Hi everybody,
>>>
>>> I'm currently facing an issue with our production installation of
>> CouchDB.
>>> Two times within the past 5 days, the Erlang process running  
>>> CouchDB pegs
>>> one of the 4 cores on the machine, consumes about 40% of the  
>>> system RAM
>>> (which is 4GB), and becomes completely unresponsive to incoming HTTP
>>> requests.  The only way we can get it back to normal is to restart
>> CouchDB.
>>>
>>> I'm trying to determine what may be causing this, but I'm not  
>>> having much
>>> luck.  Nothing stands out in the CouchDB log files.  I can see  
>>> that there
>>> are no entries in the log files from the time it goes unresponsive  
>>> until
>> the
>>> time I restart it.  Besides that, there doesn't appear to be any  
>>> errors
>>> leading up to the issue.  There are however a few errors like the  
>>> one
>> below,
>>> but none right before CouchDB goes unresponsive:
>>>
>>> [error] [<0.11738.288>] {error_report,<0.21.0>,
>>>   {<0.11738.288>,crash_report,
>>>    [[{pid,<0.11738.288>},
>>>      {registered_name,[]},
>>>      {error_info,
>>>          {error,
>>>              {case_clause,{error,enotconn}},
>>>              [{mochiweb_request,get,2},
>>>               {couch_httpd,handle_request,4},
>>>               {mochiweb_http,headers,5},
>>>               {proc_lib,init_p,5}]}},
>>>      {initial_call,
>>>          {mochiweb_socket_server,acceptor_loop,
>>>              [{<0.56.0>,#Port<0.148>,#Fun<mochiweb_http. 
>>> 1.81679042>}]}},
>>>      {ancestors,
>>>          [couch_httpd,couch_secondary_services,couch_server_sup,
>>>           <0.1.0>]},
>>>      {messages,[]},
>>>      {links,[<0.56.0>,#Port<0.5032425>]},
>>>      {dictionary,[{mochiweb_request_qs,[{"limit","0"}]}]},
>>>      {trap_exit,false},
>>>      {status,running},
>>>      {heap_size,28657},
>>>      {stack_size,23},
>>>      {reductions,14034}],
>>>     []]}}
>>> [error] [<0.56.0>] {error_report,<0.21.0>,
>>>   {<0.56.0>,std_error,
>>>    {mochiweb_socket_server,235,
>>>        {child_error,{case_clause,{error,enotconn}}}}}}
>>>
>>> =ERROR REPORT==== 30-Aug-2009::04:29:07 ===
>>> {mochiweb_socket_server,235,
>>>                       {child_error,{case_clause,{error,enotconn}}}}
>>>
>>> I checked some of the other system log files (/var/log/messages,  
>>> etc),
>> and
>>> there doesn't appear to be any information there either.
>>>
>>> Our CouchDB installation is fairly large.  We have 7 production
>> databases,
>>> totaling almost 250GB.  The largest database is 129GB.  We are  
>>> running
>>> CouchDB 0.9.0 on Red Hat Enterprise Server 5.3.  As far as usage  
>>> goes, we
>>> are constantly inserting documents into the database (5,000 at a  
>>> time via
>> a
>>> bulk insert), and pausing to regenerate the views after 100,000  
>>> documents
>>> have been inserted.  Besides for the process that does the  
>>> inserts, all
>>> views are accessed using stale=ok.
>>>
>>> Has anybody else faced a similar issue?  Can anybody suggest tips
>> regarding
>>> how I should go about diagnosing this issue?
>>>
>>
>> Just a guess, based on the information available here, but the
>> enotconn error suggests that the remote client is dropping the
>> connection prematurely. There is an old bug about this in the  
>> tracker,
>> which might be a good thing to reopen if we learn much more about the
>> issue (and it is still present in trunk / 0.10):
>>
>> http://issues.apache.org/jira/browse/COUCHDB-45
>>
>> There is also this open bug which could be related:
>>
>> https://issues.apache.org/jira/browse/COUCHDB-394
>>
>> Perhaps you have clients who aren't properly closing the connection,
>> and them somehow this is running up against a limit in the underlying
>> server system (max number of connections, or maybe even max number of
>> erlang processes in the vm).
>>
>> It would be nice to get to the bottom of this one, eventually.
>>
>> The first step I'd suggest taking is attempting to reproduce on the
>> 0.10.x branch from svn. This will at least tell us if the bug has  
>> been
>> fixed. If it's still around and repeatable, that will give us a test
>> case for finally crushing it into oblivion.
>>
>> It might help to know more about which client library you are using,
>> as this bug seems to depend on the TCP behavior of clients.
>>
>> Chris
>>
>>> Thanks,
>>> John
>>>
>>> --
>>> John Wood
>>> Interactive Mediums
>>> john@interactivemediums.com
>>>
>>
>>
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>
>
>
> -- 
> John Wood
> Interactive Mediums
> john@interactivemediums.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message