Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8352F9890 for ; Sun, 18 Mar 2012 20:49:20 +0000 (UTC) Received: (qmail 26124 invoked by uid 500); 18 Mar 2012 20:49:19 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 26074 invoked by uid 500); 18 Mar 2012 20:49:19 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 26065 invoked by uid 99); 18 Mar 2012 20:49:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Mar 2012 20:49:19 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [80.244.253.218] (HELO mail.traeumt.net) (80.244.253.218) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Mar 2012 20:49:12 +0000 Received: from [10.0.0.15] (91-64-198-154-dynip.superkabel.de [91.64.198.154]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.traeumt.net (Postfix) with ESMTPSA id 2D46E141A0 for ; Sun, 18 Mar 2012 21:49:38 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1257) Subject: Re: {error,emfile} on CouchDB 1.2.x From: Jan Lehnardt In-Reply-To: Date: Sun, 18 Mar 2012 21:48:50 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <5D7F6BB2-EBFE-40F2-A6B1-33BA8A5972DF@apache.org> References: <4F6624AA.7050202@gmail.com> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org On Mar 18, 2012, at 21:46 , Randall Leeds wrote: > On Sun, Mar 18, 2012 at 13:39, Jan Lehnardt wrote: >=20 >>=20 >> On Mar 18, 2012, at 21:28 , Randall Leeds wrote: >>=20 >>> On Sun, Mar 18, 2012 at 11:08, Stefan K=F6gl >> wrote: >>>=20 >>>> Hi, >>>>=20 >>>> Another thing I noticed during my tests of CouchDB 1.2.x. I = redirected >>>> live traffic to the instance and after a rather short time, = requests >>>> were failing with the following information in the logs: >>>>=20 >>>>=20 >>>> [Sun, 18 Mar 2012 16:39:24 GMT] [error] [<0.27554.2>] >>>> {error_report,<0.31.0>, >>>> {<0.27554.2>,std_error, >>>> [{application,mochiweb}, >>>> "Accept failed error", >>>> "{error,emfile}"]}} >>>> [Sun, 18 Mar 2012 16:39:24 GMT] [error] [<0.27554.2>] >>>> {error_report,<0.31.0>, >>>> {<0.27554.2>,crash_report, >>>> [[{initial_call, >>>> {mochiweb_acceptor,init, >>>> ['Argument__1','Argument__2', >>>> 'Argument__3']}}, >>>> {pid,<0.27554.2>}, >>>> {registered_name,[]}, >>>> {error_info, >>>> {exit, >>>> {error,accept_failed}, >>>> [{mochiweb_acceptor,init,3}, >>>> {proc_lib,init_p_do_apply,3}]}}, >>>> {ancestors, >>>> = [couch_httpd,couch_secondary_services, >>>> couch_server_sup,<0.32.0>]}, >>>> {messages,[]}, >>>> {links,[<0.129.0>]}, >>>> {dictionary,[]}, >>>> {trap_exit,false}, >>>> {status,running}, >>>> {heap_size,233}, >>>> {stack_size,24}, >>>> {reductions,244}], >>>> []]}} >>>>=20 >>>>=20 >>>> I think "emfile" means that CouchDB (or mochiweb?) couldn't open = any >>>> more files / connections. I've set the (hard and soft) nofile limit = for >>>> user couchdb to 4096, but didn't raise the ERL_MAX_PORTS = accordingly. >>>> Anyway, as soon as the error occured, CouchDB started writing most = of my >>>> view files from scratch, rendering the instance unusable. >>>>=20 >>>> I'd expect CouchDB to fail more gracefully when the maximum number = of >>>> open files is reached. Is this a bug or expected behaviour? >>>>=20 >>>=20 >>> Looks like a bug. Whenever there's a problem opening a view file, >>> couch_view tries to delete it. Clearly, this is not the right course = of >>> action when the problem is due to emfile. >>=20 >> This looks rather serious. I opened a JIRA: >>=20 >> https://issues.apache.org/jira/browse/COUCHDB-1445 >>=20 >> And started collecting the info. Bob N's message came in in the = meantime >> and I agree, we should see if there's more cases where we need to be >> careful. >>=20 >> Also, I'd consider this blocking for 1.2.0. >>=20 >> Anyone who can pitch in with their expertise is more than welcome! :) >>=20 >=20 > Assigned to me. Patch forthcoming. Agree in should block 1.2.0, = especially > because upgrades are the sort of things where bad packaging downstream > might cause custom ERL_MAX_PORTS settings to be overwritten and we = wouldn't > want anyone's production to have its views erased needlessly. Thanks for taking this on Randall! Cheers Jan --=20 >=20 > -Randall >=20 >=20 >>=20 >> Cheers >> Jan >> -- >>=20 >>=20 >>>=20 >>> Here's a patch that I propose might fix it. I'd like to hear from = another >>> dev on this, or if there's a better way we should bail out. >>>=20 >>> diff --git a/src/couchdb/couch_view_group.erl >>> b/src/couchdb/couch_view_group.erl >>> index 97fc512..ab075bd 100644 >>> --- a/src/couchdb/couch_view_group.erl >>> +++ b/src/couchdb/couch_view_group.erl >>> @@ -469,6 +469,10 @@ open_index_file(RootDir, DbName, GroupSig) -> >>> case couch_file:open(FileName) of >>> {ok, Fd} -> {ok, Fd}; >>> {error, enoent} -> couch_file:open(FileName, [create]); >>> + {error, emfile} -> >>> + ?LOG_ERROR("Could not open file for view index: max open = files >>> reached. " >>> + "Raise ERL_MAX_PORTS or system limits.", []), >>> + throw({error, emfile}); >>> Error -> Error >>> end. >>=20 >>=20