Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 7420 invoked from network); 1 Sep 2009 15:06:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Sep 2009 15:06:08 -0000 Received: (qmail 85386 invoked by uid 500); 1 Sep 2009 15:06:07 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 85350 invoked by uid 500); 1 Sep 2009 15:06:07 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 85340 invoked by uid 99); 1 Sep 2009 15:06:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 15:06:07 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of john@interactivemediums.com) Received: from [209.85.211.194] (HELO mail-yw0-f194.google.com) (209.85.211.194) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 15:05:57 +0000 Received: by ywh32 with SMTP id 32so57944ywh.11 for ; Tue, 01 Sep 2009 08:05:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.1.5 with SMTP id 5mr10532014yba.285.1251816767261; Tue, 01 Sep 2009 07:52:47 -0700 (PDT) From: John Wood Date: Tue, 1 Sep 2009 09:52:27 -0500 Message-ID: Subject: CouchDB pegging the CPU and not responding to requests To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=000e0cd4d300bc8cda0472854dff X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd4d300bc8cda0472854dff Content-Type: text/plain; charset=ISO-8859-1 Hi everybody, I'm currently facing an issue with our production installation of CouchDB. Two times within the past 5 days, the Erlang process running CouchDB pegs one of the 4 cores on the machine, consumes about 40% of the system RAM (which is 4GB), and becomes completely unresponsive to incoming HTTP requests. The only way we can get it back to normal is to restart CouchDB. I'm trying to determine what may be causing this, but I'm not having much luck. Nothing stands out in the CouchDB log files. I can see that there are no entries in the log files from the time it goes unresponsive until the time I restart it. Besides that, there doesn't appear to be any errors leading up to the issue. There are however a few errors like the one below, but none right before CouchDB goes unresponsive: [error] [<0.11738.288>] {error_report,<0.21.0>, {<0.11738.288>,crash_report, [[{pid,<0.11738.288>}, {registered_name,[]}, {error_info, {error, {case_clause,{error,enotconn}}, [{mochiweb_request,get,2}, {couch_httpd,handle_request,4}, {mochiweb_http,headers,5}, {proc_lib,init_p,5}]}}, {initial_call, {mochiweb_socket_server,acceptor_loop, [{<0.56.0>,#Port<0.148>,#Fun}]}}, {ancestors, [couch_httpd,couch_secondary_services,couch_server_sup, <0.1.0>]}, {messages,[]}, {links,[<0.56.0>,#Port<0.5032425>]}, {dictionary,[{mochiweb_request_qs,[{"limit","0"}]}]}, {trap_exit,false}, {status,running}, {heap_size,28657}, {stack_size,23}, {reductions,14034}], []]}} [error] [<0.56.0>] {error_report,<0.21.0>, {<0.56.0>,std_error, {mochiweb_socket_server,235, {child_error,{case_clause,{error,enotconn}}}}}} =ERROR REPORT==== 30-Aug-2009::04:29:07 === {mochiweb_socket_server,235, {child_error,{case_clause,{error,enotconn}}}} I checked some of the other system log files (/var/log/messages, etc), and there doesn't appear to be any information there either. Our CouchDB installation is fairly large. We have 7 production databases, totaling almost 250GB. The largest database is 129GB. We are running CouchDB 0.9.0 on Red Hat Enterprise Server 5.3. As far as usage goes, we are constantly inserting documents into the database (5,000 at a time via a bulk insert), and pausing to regenerate the views after 100,000 documents have been inserted. Besides for the process that does the inserts, all views are accessed using stale=ok. Has anybody else faced a similar issue? Can anybody suggest tips regarding how I should go about diagnosing this issue? Thanks, John -- John Wood Interactive Mediums john@interactivemediums.com --000e0cd4d300bc8cda0472854dff--