Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 42658 invoked from network); 15 Dec 2009 14:37:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Dec 2009 14:37:43 -0000 Received: (qmail 29951 invoked by uid 500); 15 Dec 2009 14:37:42 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 29740 invoked by uid 500); 15 Dec 2009 14:37:41 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 29723 invoked by uid 99); 15 Dec 2009 14:37:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 14:37:40 +0000 X-ASF-Spam-Status: No, hits=-10.0 required=5.0 tests=AWL,BAYES_00,FS_REPLICA,RCVD_IN_DNSWL_HI,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 14:37:38 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 598D0234C4C2 for ; Tue, 15 Dec 2009 06:37:18 -0800 (PST) Message-ID: <2045735277.1260887838365.JavaMail.jira@brutus> Date: Tue, 15 Dec 2009 14:37:18 +0000 (UTC) From: "Robert Newson (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-597) Replication tasks crash. In-Reply-To: <1826646313.1260661158168.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790760#action_12790760 ] Robert Newson commented on COUCHDB-597: --------------------------------------- Replication tasks are failing even if executed serially as long as databases are large enough (1.3 gb in this case). The fourth replication task has crashed. Stack traces from the end of my log while a replication tasks is hung/crashed; Tue, 15 Dec 2009 07:08:44 GMT] [error] [<0.49.0>] ** Generic server couch_task_status terminating ** Last message in was {#Ref<0.0.1832.61391>,3} ** When Server state == nil ** Reason for termination == ** {function_clause, [{couch_task_status,handle_info,[{#Ref<0.0.1832.61391>,3},nil]}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} Tue, 15 Dec 2009 07:08:44 GMT] [error] [<0.45.0>] {error_report,<0.23.0>, {<0.45.0>,supervisor_report, [{supervisor,{local,couch_primary_services}}, {errorContext,child_terminated}, {reason, {function_clause, [{couch_task_status,handle_info,[{#Ref<0.0.1832.61391>,3},nil]}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}}, {offender, [{pid,<0.49.0>}, {name,couch_task_status}, {mfa,{couch_task_status,start_link,[]}}, {restart_type,permanent}, {shutdown,brutal_kill}, {child_type,worker}]}]}} [Tue, 15 Dec 2009 07:08:51 GMT] [error] [<0.2720.204>] {error_report,<0.23.0>, {<0.2720.204>,crash_report, [[{initial_call,{couch_task_status,init,['Argument__1']}}, {pid,<0.2720.204>}, {registered_name,couch_task_status}, {error_info,{exit,{{badmatch,[]}, [{couch_task_status,handle_cast,2}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[couch_primary_services,couch_server_sup,<0.1.0>]}, {messages,[]}, {links,[<0.45.0>]}, {dictionary,[]}, {trap_exit,false}, {status,running}, {heap_size,377}, {stack_size,24}, {reductions,127}], []]}} [Tue, 15 Dec 2009 07:08:51 GMT] [error] [<0.45.0>] {error_report,<0.23.0>, {<0.45.0>,supervisor_report, [{supervisor,{local,couch_primary_services}}, {errorContext,child_terminated}, {reason,{{badmatch,[]}, [{couch_task_status,handle_cast,2}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}}, {offender,[{pid,<0.2720.204>}, {name,couch_task_status}, {mfa,{couch_task_status,start_link,[]}}, {restart_type,permanent}, {shutdown,brutal_kill}, {child_type,worker}]}]}} [Tue, 15 Dec 2009 07:08:57 GMT] [error] [<0.4889.204>] ** Generic server couch_task_status terminating ** Last message in was {'$gen_cast', {update_status,<0.9558.169>, <<"Copied 146001 of 271595 changes (53%)">>}} ** When Server state == nil ** Reason for termination == ** {{badmatch,[]}, [{couch_task_status,handle_cast,2}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} [Tue, 15 Dec 2009 07:08:57 GMT] [error] [<0.4889.204>] {error_report,<0.23.0>, {<0.4889.204>,crash_report, [[{initial_call,{couch_task_status,init,['Argument__1']}}, {pid,<0.4889.204>}, {registered_name,couch_task_status}, {error_info,{exit,{{badmatch,[]}, [{couch_task_status,handle_cast,2}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[couch_primary_services,couch_server_sup,<0.1.0>]}, {messages,[]}, {links,[<0.45.0>]}, {dictionary,[]}, {trap_exit,false}, {status,running}, {heap_size,377}, {stack_size,24}, {reductions,127}], []]}} [Tue, 15 Dec 2009 07:08:57 GMT] [error] [<0.45.0>] {error_report,<0.23.0>, {<0.45.0>,supervisor_report, [{supervisor,{local,couch_primary_services}}, {errorContext,child_terminated}, {reason,{{badmatch,[]}, [{couch_task_status,handle_cast,2}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}}, {offender,[{pid,<0.4889.204>}, {name,couch_task_status}, {mfa,{couch_task_status,start_link,[]}}, {restart_type,permanent}, {shutdown,brutal_kill}, {child_type,worker}]}]}} [Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.45.0>] {error_report,<0.23.0>, {<0.45.0>,supervisor_report, [{supervisor,{local,couch_primary_services}}, {errorContext,shutdown}, {reason,reached_max_restart_intensity}, {offender,[{pid,<0.6117.204>}, {name,couch_task_status}, {mfa,{couch_task_status,start_link,[]}}, {restart_type,permanent}, {shutdown,brutal_kill}, {child_type,worker}]}]}} [Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.60.0>] Exit on non-updater process: killed [Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.60.0>] ** Generic server couch_view terminating ** Last message in was {'EXIT',<0.61.0>,killed} ** When Server state == {server,"/var/lib/couchdb/0.10.0"} ** Reason for termination == ** killed [Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.60.0>] {error_report,<0.23.0>, {<0.60.0>,crash_report, [[{initial_call,{couch_view,init,['Argument__1']}}, {pid,<0.60.0>}, {registered_name,couch_view}, {error_info,{exit,killed, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[couch_secondary_services,couch_server_sup, <0.1.0>]}, {messages,[]}, {links,[<0.52.0>]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,2584}, {stack_size,24}, {reductions,5320}], []]}} [Tue, 15 Dec 2009 07:09:02 GMT] [error] [<0.52.0>] {error_report,<0.23.0>, {<0.52.0>,supervisor_report, [{supervisor,{local,couch_secondary_services}}, {errorContext,child_terminated}, {reason,killed}, {offender,[{pid,<0.60.0>}, {name,view_manager}, {mfa,{couch_view,start_link,[]}}, {restart_type,permanent}, {shutdown,brutal_kill}, {child_type,worker}]}]}} [Tue, 15 Dec 2009 07:08:44 GMT] [error] [<0.49.0>] {error_report,<0.23.0>, {<0.49.0>,crash_report, [[{initial_call,{couch_task_status,init,['Argument__1']}}, {pid,<0.49.0>}, {registered_name,couch_task_status}, {error_info, {exit, {function_clause, [{couch_task_status,handle_info, [{#Ref<0.0.1832.61391>,3},nil]}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}}, {ancestors,[couch_primary_services,couch_server_sup,<0.1.0>]}, {messages,[]}, {links,[<0.45.0>]}, {dictionary,[]}, {trap_exit,false}, {status,running}, {heap_size,2584}, {stack_size,24}, {reductions,191624}], []]}} > Replication tasks crash. > ------------------------ > > Key: COUCHDB-597 > URL: https://issues.apache.org/jira/browse/COUCHDB-597 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Affects Versions: 0.11 > Reporter: Robert Newson > > If I kick off 10 replication tasks in quick succession, occasionally one or two of the replication tasks will die and not be resumed. It seems that the stat tracking is a little buggy, and under stress can eventually cause a permanent failure of the supervised replication task; > [Fri, 11 Dec 2009 19:00:08 GMT] [error] [<0.80.0>] {error_report,<0.30.0>, > {<0.80.0>,supervisor_report, > [{supervisor,{local,couch_rep_sup}}, > {errorContext,shutdown_error}, > {reason,killed}, > {offender, > [{pid,<0.6700.11>}, > {name,"fcbb13200a1618cf983b347f4d2c9835+create_target"}, > {mfa, > {gen_server,start_link, > [couch_rep, > ["fcbb13200a1618cf983b347f4d2c9835", > {[{<<"create_target">>,true}, > {<<"source">>,<<"http://node:5984/perf-p2">>}, > {<<"target">>,<<"perf-p2">>}]}, > {user_ctx,null,[<<"_admin">>]}], > []]}}, > {restart_type,temporary}, > {shutdown,1}, > {child_type,worker}]}]}} > [Fri, 11 Dec 2009 19:00:08 GMT] [error] [emulator] Error in process <0.6705.11> with exit value: {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement,1}]} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.