Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69F80FA50 for ; Tue, 2 Apr 2013 18:15:16 +0000 (UTC) Received: (qmail 92371 invoked by uid 500); 2 Apr 2013 18:15:15 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 92308 invoked by uid 500); 2 Apr 2013 18:15:15 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 92299 invoked by uid 99); 2 Apr 2013 18:15:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 18:15:15 +0000 Date: Tue, 2 Apr 2013 18:15:15 +0000 (UTC) From: "Wendall Cada (JIRA)" To: dev@couchdb.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COUCHDB-1757) CouchDB 1.3.0rc3 crashes when _replicator contains a lot of docs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-1757?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13= 620095#comment-13620095 ]=20 Wendall Cada commented on COUCHDB-1757: --------------------------------------- What I've found is that ANY error in couchdb kills the _replicator database= jobs. I have been able to manually confirm this by creating some random er= ror elsewhere. I don't think this is a bug in only 1.3.0, but also in 1.2.x= , as I've seen it there on several occasions. =20 > CouchDB 1.3.0rc3 crashes when _replicator contains a lot of docs > ---------------------------------------------------------------- > > Key: COUCHDB-1757 > URL: https://issues.apache.org/jira/browse/COUCHDB-1757 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Reporter: Sander Dijkhuis > > I=E2=80=99m deploying an experimental game based on CouchDB with one user= per database. For access control, I=E2=80=99m using several _replicator do= cs per user: > - one filtered replication from the shared db to the user db, > - one unfiltered replication from the user db to the shared db, > - two replications using doc_ids per =E2=80=98friendship=E2=80=99 (to sha= re both profiles). > At the moment, this results in 420 continuous replications running. Couch= DB 1.3.0rc3 on Ubuntu crashes a couple of seconds after starting, and doesn= =E2=80=99t crash when I temporarily remove the _replicator database. When I= used 1.3.0rc1, CouchDB would crash after a few minutes to a few hours. > Some details from the crash report are below, filtered for privacy, to av= oid repetition and to hide the _design doc that=E2=80=99s shown in the log.= Let me know if you need more detail or if I should share one of the _desig= n functions used. > Am I abusing the replication system, or can I change a setting to allow f= or longer timeouts? > -- > First, I get something like this for each _replicator doc: > {code} > [info] [<0.5368.0>] Replication `"5529b4bdb9c5bdc15b558bd7588511d9+contin= uous"` is using: > =094 worker processes > =09a worker batch size of 500 > =0920 HTTP connections > =09a connection timeout of 30000 milliseconds > =0910 retries per request > =09socket options are: [{keepalive,true},{nodelay,false}] > =09source start sequence 6908 > [info] [<0.5368.0>] Document `lunacy:to:USERNAME` triggered replication `= 5529b4bdb9c5bdc15b558bd7588511d9+continuous` > [info] [<0.1213.0>] starting new replication `5529b4bdb9c5bdc15b558bd7588= 511d9+continuous` at <0.5368.0> (`lunacy` -> `lunacy/user/USERNAME`) > {code} > Then: > {code} > [error] [<0.5408.0>] OS Process died with status: 137 > [error] [<0.5408.0>] ** Generic server <0.5408.0> terminating=20 > ** Last message in was {#Port<0.2740>,{exit_status,137}} > ** When Server state =3D=3D {os_proc,"/home/sander/git/apache-couchdb-1.3= .0/build/bin/couchjs /home/sander/git/apache-couchdb-1.3.0/build/share/couc= hdb/server/main.js", > #Port<0.2740>, > #Fun, > #Fun,5000} > ** Reason for termination =3D=3D=20 > ** {exit_status,137} > {code} > Followed by: > {code} > =3DERROR REPORT=3D=3D=3D=3D 2-Apr-2013::19:18:20 =3D=3D=3D > ** Generic server <0.5408.0> terminating=20 > ** Last message in was {#Port<0.2740>,{exit_status,137}} > ** When Server state =3D=3D {os_proc,"/home/sander/git/apache-couchdb-1.3= .0/build/bin/couchjs /home/sander/git/apache-couchdb-1.3.0/build/share/couc= hdb/server/main.js", > #Port<0.2740>, > #Fun, > #Fun,5000} > ** Reason for termination =3D=3D=20 > ** {exit_status,137} > [error] [<0.5408.0>] {error_report,<0.31.0>, > {<0.5408.0>,crash_report, > [[{initial_call, > {couch_os_process,init,['Argument__1']}}, > {pid,<0.5408.0>}, > {registered_name,[]}, > {error_info, > {exit, > {exit_status,137}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_query_servers,couch_secondary_serv= ices, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.111.0>,<0.5339.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,1597}, > {stack_size,24}, > {reductions,1197}], > [{neighbour, > [{pid,<0.5345.0>}, > {registered_name,[]}, > {initial_call, > {couch_event_sup,init,['Argument__1'= ]}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5339.0>]}, > {messages,[]}, > {links,[<0.5339.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}]}, > {neighbour, > [{pid,<0.5339.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5345.0>,<0.5408.0>,<0.5335.0>= ]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}]}]]}} > =3DCRASH REPORT=3D=3D=3D=3D 2-Apr-2013::19:18:21 =3D=3D=3D > crasher: > initial call: couch_os_process:init/1 > pid: <0.5408.0> > registered_name: [] > exception exit: {exit_status,137} > in function gen_server:terminate/6 > ancestors: [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>] > messages: [] > links: [<0.111.0>,<0.5339.0>] > dictionary: [] > trap_exit: false > status: running > heap_size: 1597 > stack_size: 24 > reductions: 1197 > neighbours: > neighbour: [{pid,<0.5345.0>}, > {registered_name,[]}, > {initial_call,{couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5339.0>]}, > {messages,[]}, > {links,[<0.5339.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}] > neighbour: [{pid,<0.5339.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5345.0>,<0.5408.0>,<0.5335.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}] > [error] [<0.5335.0>] ChangesReader process died with reason: {exit_status= ,137} > [error] [<0.111.0>] OS Process Error <0.5412.0> :: {os_process_error, > "OS process timed out= ."} > [error] [<0.5387.0>] OS Process died with status: 137 > [error] [<0.5385.0>] OS Process died with status: 137 > [error] [<0.5335.0>] Replication `f7ecf7f435811899c912619f899f24b4+contin= uous` (`lunacy` -> `lunacy/user/USERNAME`) failed: changes_reader_died > [error] [<0.5258.0>] ChangesReader process died with reason: shutdown > [error] [<0.5387.0>] ** Generic server <0.5387.0> terminating=20 > ** Last message in was {#Port<0.2730>,{exit_status,137}} > ** When Server state =3D=3D {os_proc,"/home/sander/git/apache-couchdb-1.3= .0/build/bin/couchjs /home/sander/git/apache-couchdb-1.3.0/build/share/couc= hdb/server/main.js", > #Port<0.2730>, > #Fun, > #Fun,5000} > ** Reason for termination =3D=3D=20 > ** {exit_status,137} > =3DERROR REPORT=3D=3D=3D=3D 2-Apr-2013::19:18:21 =3D=3D=3D > ** Generic server <0.5387.0> terminating=20 > ** Last message in was {#Port<0.2730>,{exit_status,137}} > ** When Server state =3D=3D {os_proc,"/home/sander/git/apache-couchdb-1.3= .0/build/bin/couchjs /home/sander/git/apache-couchdb-1.3.0/build/share/couc= hdb/server/main.js", > #Port<0.2730>, > #Fun, > #Fun,5000} > ** Reason for termination =3D=3D=20 > ** {exit_status,137} > [error] [<0.5385.0>] ** Generic server <0.5385.0> terminating=20 > ** Last message in was {#Port<0.2729>,{exit_status,137}} > ** When Server state =3D=3D {os_proc,"/home/sander/git/apache-couchdb-1.3= .0/build/bin/couchjs /home/sander/git/apache-couchdb-1.3.0/build/share/couc= hdb/server/main.js", > #Port<0.2729>, > #Fun, > #Fun,5000} > ** Reason for termination =3D=3D=20 > ** {exit_status,137} > =3DERROR REPORT=3D=3D=3D=3D 2-Apr-2013::19:18:21 =3D=3D=3D > ** Generic server <0.5385.0> terminating=20 > ** Last message in was {#Port<0.2729>,{exit_status,137}} > ** When Server state =3D=3D {os_proc,"/home/sander/git/apache-couchdb-1.3= .0/build/bin/couchjs /home/sander/git/apache-couchdb-1.3.0/build/share/couc= hdb/server/main.js", > #Port<0.2729>, > #Fun, > #Fun,5000} > ** Reason for termination =3D=3D=20 > ** {exit_status,137} > [error] [<0.5385.0>] {error_report,<0.31.0>, > {<0.5385.0>,crash_report, > [[{initial_call, > {couch_os_process,init,['Argument__1']}}, > {pid,<0.5385.0>}, > {registered_name,[]}, > {error_info, > {exit, > {exit_status,137}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_query_servers,couch_secondary_serv= ices, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.111.0>,<0.5207.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,1597}, > {stack_size,24}, > {reductions,1205}], > [{neighbour, > [{pid,<0.5213.0>}, > {registered_name,[]}, > {initial_call, > {couch_event_sup,init,['Argument__1'= ]}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5207.0>]}, > {messages,[]}, > {links,[<0.5207.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}]}, > {neighbour, > [{pid,<0.5207.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5213.0>,<0.5385.0>,<0.5203.0>= ]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}]}]]}} > =3DCRASH REPORT=3D=3D=3D=3D 2-Apr-2013::19:18:22 =3D=3D=3D > crasher: > initial call: couch_os_process:init/1 > pid: <0.5385.0> > registered_name: [] > exception exit: {exit_status,137} > in function gen_server:terminate/6 > ancestors: [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>] > messages: [] > links: [<0.111.0>,<0.5207.0>] > dictionary: [] > trap_exit: false > status: running > heap_size: 1597 > stack_size: 24 > reductions: 1205 > neighbours: > neighbour: [{pid,<0.5213.0>}, > {registered_name,[]}, > {initial_call,{couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5207.0>]}, > {messages,[]}, > {links,[<0.5207.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}] > neighbour: [{pid,<0.5207.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5213.0>,<0.5385.0>,<0.5203.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1988}] > [error] [<0.5387.0>] {error_report,<0.31.0>, > {<0.5387.0>,crash_report, > [[{initial_call, > {couch_os_process,init,['Argument__1']}}, > {pid,<0.5387.0>}, > {registered_name,[]}, > {error_info, > {exit, > {exit_status,137}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_query_servers,couch_secondary_serv= ices, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.111.0>,<0.5218.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,1597}, > {stack_size,24}, > {reductions,1205}], > [{neighbour, > [{pid,<0.5224.0>}, > {registered_name,[]}, > {initial_call, > {couch_event_sup,init,['Argument__1'= ]}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5218.0>]}, > {messages,[]}, > {links,[<0.5218.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}]}, > {neighbour, > [{pid,<0.5218.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5224.0>,<0.5387.0>,<0.5214.0>= ]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1947}]}]]}} > =3DCRASH REPORT=3D=3D=3D=3D 2-Apr-2013::19:18:24 =3D=3D=3D > crasher: > initial call: couch_os_process:init/1 > pid: <0.5387.0> > registered_name: [] > exception exit: {exit_status,137} > in function gen_server:terminate/6 > ancestors: [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>] > messages: [] > links: [<0.111.0>,<0.5218.0>] > dictionary: [] > trap_exit: false > status: running > heap_size: 1597 > stack_size: 24 > reductions: 1205 > neighbours: > neighbour: [{pid,<0.5224.0>}, > {registered_name,[]}, > {initial_call,{couch_event_sup,init,['Argument__1']}}, > {current_function,{gen_server,loop,6}}, > {ancestors,[<0.5218.0>]}, > {messages,[]}, > {links,[<0.5218.0>,<0.89.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,987}, > {stack_size,9}, > {reductions,32}] > neighbour: [{pid,<0.5218.0>}, > {registered_name,[]}, > {initial_call,{erlang,apply,2}}, > {current_function,{gen,do_call,4}}, > {ancestors,[]}, > {messages,[]}, > {links,[<0.5224.0>,<0.5387.0>,<0.5214.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,waiting}, > {heap_size,6765}, > {stack_size,104}, > {reductions,1947}] > [error] [<0.5302.0>] ChangesReader process died with reason: shutdown > [error] [<0.5192.0>] ChangesReader process died with reason: shutdown > [error] [<0.5203.0>] ChangesReader process died with reason: {exit_status= ,137} > [error] [<0.5214.0>] ChangesReader process died with reason: {exit_status= ,137} > [error] [<0.3692.0>] ChangesReader process died with reason: shutdown > [error] [<0.5258.0>] Replication `3d6539a2a9e3201a6eacd0b7db4c7dd3+contin= uous` (`lunacy` -> `lunacy/user/USERNAME`) failed: changes_reader_died > [error] [<0.5170.0>] ChangesReader process died with reason: shutdown > [error] [<0.5236.0>] ChangesReader process died with reason: shutdown > [error] [<0.5280.0>] ChangesReader process died with reason: shutdown > [error] [<0.5225.0>] ChangesReader process died with reason: shutdown > [error] [<0.5324.0>] ChangesReader process died with reason: shutdown > [error] [<0.5291.0>] ChangesReader process died with reason: shutdown > [error] [<0.5313.0>] ChangesReader process died with reason: shutdown > [error] [<0.5181.0>] ChangesReader process died with reason: shutdown > [error] [<0.5269.0>] ChangesReader process died with reason: shutdown > [error] [<0.111.0>] ** Generic server couch_query_servers terminating=20 > ** Last message in was {get_proc,{doc,<<"_design/server">>, > {31, > [<<2,129,73,127,145,177,85,156,51,= 70,79, > 122,210,226,20,220>>, (ET CETER= A) > [],false,[]}, > {<<"_design/server">>, > <<"31-0281497f91b1559c33464f7ad2e214dc"= >>}} > ** When Server state =3D=3D {qserver,32811,41005,45102,36908,[], > {[{<<"reduce_limit">>,true}, > {<<"timeout">>,5000}]}} > ** Reason for termination =3D=3D=20 > ** {bad_return_value,{os_process_error,"OS process timed out."}} > {code} > And finally: > {code} > {'$gen_call', > {<0.3696.0>,#Ref<0.0.0.31225>}, > {unlink_proc,<0.3714.0>}}, > {'$gen_call', > {<0.5174.0>,#Ref<0.0.0.31231>}, > {unlink_proc,<0.5379.0>}}, > {'$gen_call', > {<0.5185.0>,#Ref<0.0.0.31237>}, > {unlink_proc,<0.5381.0>}}, > {'$gen_call', > {<0.5196.0>,#Ref<0.0.0.31243>}, > {unlink_proc,<0.5383.0>}}, > {'$gen_call', > {<0.5207.0>,#Ref<0.0.0.31249>}, > {unlink_proc,<0.5385.0>}}, > {'$gen_call', > {<0.5218.0>,#Ref<0.0.0.31255>}, > {unlink_proc,<0.5387.0>}}, > {'$gen_call', > {<0.5229.0>,#Ref<0.0.0.31261>}, > {unlink_proc,<0.5389.0>}}, > {'$gen_call', > {<0.5240.0>,#Ref<0.0.0.31267>}, > {unlink_proc,<0.5391.0>}}, > {'$gen_call', > {<0.5262.0>,#Ref<0.0.0.31273>}, > {unlink_proc,<0.5393.0>}}, > {'$gen_call', > {<0.5273.0>,#Ref<0.0.0.31299>}, > {unlink_proc,<0.5395.0>}}, > {'$gen_call', > {<0.5284.0>,#Ref<0.0.0.31305>}, > {unlink_proc,<0.5398.0>}}, > {'$gen_call', > {<0.5295.0>,#Ref<0.0.0.31311>}, > {unlink_proc,<0.5400.0>}}, > {'$gen_call', > {<0.5306.0>,#Ref<0.0.0.31317>}, > {unlink_proc,<0.5402.0>}}, > {'$gen_call', > {<0.5317.0>,#Ref<0.0.0.31323>}, > {unlink_proc,<0.5404.0>}}, > {'$gen_call', > {<0.5328.0>,#Ref<0.0.0.31329>}, > {unlink_proc,<0.5406.0>}}, > {'$gen_call', > {<0.5339.0>,#Ref<0.0.0.31359>}, > {unlink_proc,<0.5408.0>}}, > {'EXIT',<0.5408.0>,{exit_status,137}}, > {'DOWN',#Ref<0.0.0.31331>,process,<0.5408.0>, > {exit_status,137}}, > {'EXIT',<0.5412.0>,normal}, > {'DOWN',#Ref<0.0.0.31360>,process,<0.5412.0>,no= rmal}, > {'DOWN',#Ref<0.0.0.31269>,process,<0.5393.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.21467>,process,<0.3714.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31313>,process,<0.5402.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31239>,process,<0.5383.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31245>,process,<0.5385.0>, > {exit_status,137}}, > {'EXIT',<0.5387.0>,{exit_status,137}}, > {'DOWN',#Ref<0.0.0.31251>,process,<0.5387.0>, > {exit_status,137}}, > {'DOWN',#Ref<0.0.0.31227>,process,<0.5379.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31263>,process,<0.5391.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31257>,process,<0.5389.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31301>,process,<0.5398.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31325>,process,<0.5406.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31319>,process,<0.5404.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31307>,process,<0.5400.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31233>,process,<0.5381.0>, > shutdown}, > {'DOWN',#Ref<0.0.0.31275>,process,<0.5395.0>, > shutdown}]}, > {links,[<0.94.0>]}, > {dictionary,[]}, > {trap_exit,true}, > {status,running}, > {heap_size,17711}, > {stack_size,24}, > {reductions,7801}], > []]}} > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira