Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B81D59341 for ; Tue, 5 Jun 2012 10:08:24 +0000 (UTC) Received: (qmail 33228 invoked by uid 500); 5 Jun 2012 10:08:23 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 33144 invoked by uid 500); 5 Jun 2012 10:08:22 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 33118 invoked by uid 99); 5 Jun 2012 10:08:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 10:08:22 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [212.68.6.242] (HELO ip-212-68-6-242.codebay.fi) (212.68.6.242) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 10:08:14 +0000 Received: from 188-127-210-66.cust.suomicom.fi ([188.127.210.66] helo=[172.20.255.111]) by ip-212-68-6-242.codebay.fi with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.77) (envelope-from ) id 1Sbqg2-0005yq-Lz for user@couchdb.apache.org; Tue, 05 Jun 2012 13:07:50 +0300 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1278) Subject: Re: CouchDB 1.2.0 indexing dies silently From: Sami Sierla In-Reply-To: Date: Tue, 5 Jun 2012 13:07:47 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1FFE2F48-43B2-4493-AC3B-1D5CCBD057BA@poplatek.fi> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1278) Dave, Thank You for quick reply. The issues appear in a production environment = to which I don't have access to modify configuration or design = documents. Log level at the moment is "error" Below is a lengthy log dump we got when the os_process_timeout was 5000, = after increasing timeout to 30000 there has been no log entries at all = when indexing stops. ----- [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15656.0>] OS Process Error = <0.15657.0> :: {os_process_error, "OS process timed out."} [Thu, 31 May 2012 17:42:17 GMT] [error] [emulator] Error in process = <0.15656.0> with exit value: {{nocatch,{os_process_error,"OS process = timed = out."}},[{couch_os_process,prompt,2},{couch_query_servers,map_doc_raw,2},{= couch_view_updater,'-do_maps/3-fun-0-',3},{couch_view_updater,do_maps,3}]}= [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>] ** Generic server = <0.15648.0> terminating=20 ** Last message in was {'EXIT',<0.15653.0>, {{nocatch, {os_process_error,"OS process timed = out."}}, [{couch_os_process,prompt,2}, {couch_query_servers,map_doc_raw,2}, {couch_view_updater,'-do_maps/3-fun-0-',3}, {couch_view_updater,do_maps,3}]}} ** When Server state =3D=3D = {group_state,undefined,<<"mutka_replicated">>, = {"/data/mutka/couchdb-index",<<"mutka_replicated">>, {group, = <<223,185,95,248,235,18,77,64,18,164,253,96,95,237, 204,20>>, nil,<<"_design/transactionA-1.2.0">>, <<"javascript">>,[], [{view,0,0,0, [<<"transactionByPaymentInstrument">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.paymentInstrumentId) { = emit([doc.paymentInstrumentId,doc.startTimestamp], null); } }">>, nil,[],[]}, {view,1,0,0, [<<"transactionByTerminal">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.paymentTerminalId) { = emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>, nil,[],[]}, {view,2,0,0, [<<"transactionBySession">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.protocolSessionId) { = emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>, nil,[],[]}, {view,3,0,0, [<<"transactionByRayId">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.cId) { = emit([-(-doc.cId),doc.startTimestamp], null); } }">>, nil,[],[]}], {[]}, nil,0,0,nil,nil}}, {group, = <<223,185,95,248,235,18,77,64,18,164,253,96,95,237, 204,20>>, <0.15650.0>,<<"_design/transactionA-1.2.0">>, <<"javascript">>,[], [{view,0,236439939,0, [<<"transactionByPaymentInstrument">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.paymentInstrumentId) { = emit([doc.paymentInstrumentId,doc.startTimestamp], null); } }">>, {btree,<0.15650.0>, {47573274456,{8694059,[]},257926106}, #Fun, #Fun, #Fun, #Fun,snappy}, [],[]}, {view,1,236439939,0, [<<"transactionByTerminal">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.paymentTerminalId) { = emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>, {btree,<0.15650.0>, {47574093427,{33638477,[]},942288018}, #Fun, #Fun, #Fun, #Fun,snappy}, [],[]}, {view,2,236439939,0, [<<"transactionBySession">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.protocolSessionId) { = emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>, {btree,<0.15650.0>, {47574114746,{9241366,[]},131141244}, #Fun, #Fun, #Fun, #Fun,snappy}, [],[]}, {view,1,236439939,0, [<<"transactionByTerminal">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.paymentTerminalId) { = emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>, {btree,<0.15650.0>, {47574093427,{33638477,[]},942288018}, #Fun, #Fun, #Fun, #Fun,snappy}, [],[]}, {view,2,236439939,0, [<<"transactionBySession">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.protocolSessionId) { = emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>, {btree,<0.15650.0>, {47574114746,{9241366,[]},131141244}, #Fun, #Fun, #Fun, #Fun,snappy}, [],[]}, {view,3,236433956,0, [<<"transactionByRayId">>], <<"function(doc) { if (doc.objectType =3D=3D = \"ProtocolTransaction\" && doc.cId) { = emit([-(-doc.cId),doc.startTimestamp], null); } }">>, {btree,<0.15650.0>, {47559121340,{2250018,[]},76590679}, #Fun, #Fun, #Fun, #Fun,snappy}, [],[]}], {[]}, {btree,<0.15650.0>, {47572622835,[],1061098089}, #Fun, #Fun, #Fun,nil,snappy}, 236439939,0,nil,nil}, <0.15653.0>,nil,false, [{{<0.15441.0>,#Ref<0.0.0.182446>},409571621}], <0.15652.0>,false} ** Reason for termination =3D=3D=20 ** {os_process_error,"OS process timed out."} [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>] = {error_report,<0.31.0>, {<0.15648.0>,crash_report, [[{initial_call, {couch_view_group,init,['Argument__1']}}, {pid,<0.15648.0>}, {registered_name,[]}, {error_info, {exit, {os_process_error,"OS process timed out."}, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[<0.15647.0>]}, {messages,[]}, {links,[<0.15650.0>,<0.123.0>]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,2584}, {stack_size,24}, {reductions,18059924}], []]}} [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15441.0>] Uncaught server = error: {os_process_error, <<"OS process timed out.">>} [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>] ** Generic server = <0.15650.0> terminating=20 ** Last message in was {'EXIT',<0.15648.0>, {os_process_error,"OS = process timed out."}} ** When Server state =3D=3D = {file,{file_descriptor,prim_file,{#Port<0.2119>,19}}, 47574426987} ** Reason for termination =3D=3D=20 ** {os_process_error,"OS process timed out."} [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>] = {error_report,<0.31.0>, {<0.15650.0>,crash_report, = [[{initial_call,{couch_file,init,['Argument__1']}}, {pid,<0.15650.0>}, {registered_name,[]}, {error_info, {exit, {os_process_error,"OS process timed out."}, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[<0.15648.0>,<0.15647.0>]}, {messages,[{'EXIT',<0.15652.0>,shutdown}]}, {links,[]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,2584}, {stack_size,24}, {reductions,27732395236}], []]}} -Sami On Jun 5, 2012, at 12:23 PM, Dave Cottlehuber wrote: > On 5 June 2012 11:13, Sami Sierla wrote: >> Hi, >>=20 >> We have a rather large database (about 90 million documents /200GB) = running on CouchDB (1.0.3) and we're now updating it to version 1.2.0 = due to view compaction problems (large view group compactions never = finished). >>=20 >> At the moment we are rebuilding (JavaScript) views with 1.2.0 but = during this we have stumbled upon to new problem : indexer processes = suddenly just disappear. Initially we got "OS Process Timeout" -errors = to log but after adjusting os_process_timeout to 30secs indexing still = prematurely stops but without any log entry. >>=20 >> Any ideas what might cause this behavior? >>=20 >> CouchDB is running on RHEL 5.8 and is statically linked with = SpiderMonkey 1.8.5 >>=20 >>=20 >> Regards, >> Sami Sierla / Poplatek Oy / Finland >=20 > Sami, >=20 > Have you anything useful in the couch.log file? Are you able to run > the view generation in debug mode (might not be possible due to disk > space constraints & performance impact). >=20 > Also, if you query the view with ?limit=3D1&descending=3Dtrue you'll = get > the last doc that couch successfully processed (I think). Is there > anything special about that or the subsequent documents? If you > process the view & those docs manually into node or js.exe directly > [1] does that work? >=20 > There's quite a few changes in 1.0.3 -> 1.2.0 including better > detection of ill-formed docs amongst others, more info will help > narrow this down. >=20 > A+ > Dave >=20 > [1]: = http://wiki.apache.org/couchdb/Troubleshooting#Map.2BAC8-Reduce_debugging