couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: CouchDB 1.2.0 indexing dies silently
Date Tue, 05 Jun 2012 11:41:12 GMT
Hi Sami,

Can you observe the output of /_active_tasks during the view build and
capture that for us? Particularly close to the end of indexing.

You are aware that a view build, by design, only catches up to the
database update sequence at the time you queried and then stops? I
just want to be sure you're not expecting subsequent changes to also
be indexed (until you query again) and that we have the same idea
about when the view build should be stopping.

B.

On 5 June 2012 12:34, Sami Sierla <sami.sierla@poplatek.fi> wrote:
> Hi,
>
> Actually we are not sure if we are hitting the timeout or not since there are no log
entries anymore after increasing os_process_timeout. It might be that some view code is not
well guarded though (you can actually see the code from server state part of log) - in any
case I think there should be some error log if the JS code causes indexer failure. Compacting
views/dbs seem to be fine with 1.2.0. We have no access to curl on production environment
so debugging is a bit difficult.
>
> In the testing environment we haven't seen this issue but the data is not identical and
the number of documents is much smaller (about 1/100).
>
> View indexing is periodically re-triggered by our java application so the views will
get eventually finished after multiple resumes.
>
> Regards,
> Sami
>
> On Jun 5, 2012, at 2:20 PM, Robert Newson wrote:
>
>> You can increase that timeout with;
>>
>> curl -XPUT localhost:5984/_config/couchdb/os_process_timeout -d '"60000"'
>>
>> B.
>>
>> On 5 June 2012 12:18, Robert Newson <rnewson@apache.org> wrote:
>>> Sounds like https://issues.apache.org/jira/browse/COUCHDB-994
>>>
>>> B.
>>>
>>> On 5 June 2012 11:07, Sami Sierla <sami.sierla@poplatek.fi> wrote:
>>>> Dave,
>>>>
>>>> Thank You for quick reply. The issues appear in a production environment
to which I don't have access to modify configuration or design documents. Log level at the
moment is "error"
>>>>
>>>> Below is a lengthy log dump we got when the os_process_timeout was 5000,
after increasing timeout to 30000 there has been no log entries at all when indexing stops.
>>>>
>>>> -----
>>>>
>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15656.0>] OS Process Error
<0.15657.0> :: {os_process_error, "OS process timed out."}
>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [emulator] Error in process <0.15656.0>
with exit value: {{nocatch,{os_process_error,"OS process timed out."}},[{couch_os_process,prompt,2},{couch_query_servers,map_doc_raw,2},{couch_view_updater,'-do_maps/3-fun-0-',3},{couch_view_updater,do_maps,3}]}
>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>] ** Generic server
<0.15648.0> terminating
>>>> ** Last message in was {'EXIT',<0.15653.0>,
>>>>                           {{nocatch,
>>>>                                {os_process_error,"OS process
timed out."}},
>>>>                            [{couch_os_process,prompt,2},
>>>>                             {couch_query_servers,map_doc_raw,2},
>>>>                             {couch_view_updater,'-do_maps/3-fun-0-',3},
>>>>                             {couch_view_updater,do_maps,3}]}}
>>>> ** When Server state == {group_state,undefined,<<"mutka_replicated">>,
>>>>                         {"/data/mutka/couchdb-index",<<"mutka_replicated">>,
>>>>                          {group,
>>>>                           <<223,185,95,248,235,18,77,64,18,164,253,96,95,237,
>>>>                             204,20>>,
>>>>                           nil,<<"_design/transactionA-1.2.0">>,
>>>>                           <<"javascript">>,[],
>>>>                           [{view,0,0,0,
>>>>                             [<<"transactionByPaymentInstrument">>],
>>>>                             <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.paymentInstrumentId) { emit([doc.paymentInstrumentId,doc.startTimestamp],
null); } }">>,
>>>>                             nil,[],[]},
>>>>                            {view,1,0,0,
>>>>                             [<<"transactionByTerminal">>],
>>>>                             <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.paymentTerminalId) { emit([doc.paymentTerminalId,doc.startTimestamp],
null); } }">>,
>>>>                             nil,[],[]},
>>>>                            {view,2,0,0,
>>>>                             [<<"transactionBySession">>],
>>>>                             <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.protocolSessionId) { emit(doc.protocolSessionId,doc.protocolTransactionId);
} }">>,
>>>>                             nil,[],[]},
>>>>                            {view,3,0,0,
>>>>                             [<<"transactionByRayId">>],
>>>>                             <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.cId) { emit([-(-doc.cId),doc.startTimestamp], null);
} }">>,
>>>>                             nil,[],[]}],
>>>>                           {[]},
>>>>                           nil,0,0,nil,nil}},
>>>>                         {group,
>>>>                          <<223,185,95,248,235,18,77,64,18,164,253,96,95,237,
>>>>                            204,20>>,
>>>>                          <0.15650.0>,<<"_design/transactionA-1.2.0">>,
>>>>                          <<"javascript">>,[],
>>>>                          [{view,0,236439939,0,
>>>>                            [<<"transactionByPaymentInstrument">>],
>>>>                            <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.paymentInstrumentId) { emit([doc.paymentInstrumentId,doc.startTimestamp],
null); } }">>,
>>>>                            {btree,<0.15650.0>,
>>>>                             {47573274456,{8694059,[]},257926106},
>>>>                             #Fun<couch_btree.3.71804109>,
>>>>                             #Fun<couch_btree.4.115144917>,
>>>>                             #Fun<couch_view.less_json_ids.2>,
>>>>                             #Fun<couch_view_group.10.26766604>,snappy},
>>>>                            [],[]},
>>>>                           {view,1,236439939,0,
>>>>                            [<<"transactionByTerminal">>],
>>>>                            <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.paymentTerminalId) { emit([doc.paymentTerminalId,doc.startTimestamp],
null); } }">>,
>>>>                            {btree,<0.15650.0>,
>>>>                             {47574093427,{33638477,[]},942288018},
>>>>                             #Fun<couch_btree.3.71804109>,
>>>>                             #Fun<couch_btree.4.115144917>,
>>>>                             #Fun<couch_view.less_json_ids.2>,
>>>>                             #Fun<couch_view_group.10.26766604>,snappy},
>>>>                            [],[]},
>>>>                           {view,2,236439939,0,
>>>>                            [<<"transactionBySession">>],
>>>>                            <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.protocolSessionId) { emit(doc.protocolSessionId,doc.protocolTransactionId);
} }">>,
>>>>                            {btree,<0.15650.0>,
>>>>                             {47574114746,{9241366,[]},131141244},
>>>>                             #Fun<couch_btree.3.71804109>,
>>>>                             #Fun<couch_btree.4.115144917>,
>>>>                             #Fun<couch_view.less_json_ids.2>,
>>>>                             #Fun<couch_view_group.10.26766604>,snappy},
>>>>                            [],[]},
>>>>                           {view,1,236439939,0,
>>>>                            [<<"transactionByTerminal">>],
>>>>                            <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.paymentTerminalId) { emit([doc.paymentTerminalId,doc.startTimestamp],
null); } }">>,
>>>>                            {btree,<0.15650.0>,
>>>>                             {47574093427,{33638477,[]},942288018},
>>>>                             #Fun<couch_btree.3.71804109>,
>>>>                             #Fun<couch_btree.4.115144917>,
>>>>                             #Fun<couch_view.less_json_ids.2>,
>>>>                             #Fun<couch_view_group.10.26766604>,snappy},
>>>>                            [],[]},
>>>>                           {view,2,236439939,0,
>>>>                            [<<"transactionBySession">>],
>>>>                            <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.protocolSessionId) { emit(doc.protocolSessionId,doc.protocolTransactionId);
} }">>,
>>>>                            {btree,<0.15650.0>,
>>>>                             {47574114746,{9241366,[]},131141244},
>>>>                             #Fun<couch_btree.3.71804109>,
>>>>                             #Fun<couch_btree.4.115144917>,
>>>>                             #Fun<couch_view.less_json_ids.2>,
>>>>                             #Fun<couch_view_group.10.26766604>,snappy},
>>>>                            [],[]},
>>>>                           {view,3,236433956,0,
>>>>                            [<<"transactionByRayId">>],
>>>>                            <<"function(doc) { if (doc.objectType
== \"ProtocolTransaction\" && doc.cId) { emit([-(-doc.cId),doc.startTimestamp], null);
} }">>,
>>>>                            {btree,<0.15650.0>,
>>>>                             {47559121340,{2250018,[]},76590679},
>>>>                             #Fun<couch_btree.3.71804109>,
>>>>                             #Fun<couch_btree.4.115144917>,
>>>>                             #Fun<couch_view.less_json_ids.2>,
>>>>                             #Fun<couch_view_group.10.26766604>,snappy},
>>>>                            [],[]}],
>>>>                          {[]},
>>>>                          {btree,<0.15650.0>,
>>>>                           {47572622835,[],1061098089},
>>>>                           #Fun<couch_btree.3.71804109>,
>>>>                           #Fun<couch_btree.4.115144917>,
>>>>                           #Fun<couch_btree.5.93788370>,nil,snappy},
>>>>                          236439939,0,nil,nil},
>>>>                         <0.15653.0>,nil,false,
>>>>                         [{{<0.15441.0>,#Ref<0.0.0.182446>},409571621}],
>>>>                         <0.15652.0>,false}
>>>> ** Reason for termination ==
>>>> ** {os_process_error,"OS process timed out."}
>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>] {error_report,<0.31.0>,
>>>>                       {<0.15648.0>,crash_report,
>>>>                        [[{initial_call,
>>>>                           {couch_view_group,init,['Argument__1']}},
>>>>                          {pid,<0.15648.0>},
>>>>                          {registered_name,[]},
>>>>                          {error_info,
>>>>                           {exit,
>>>>                            {os_process_error,"OS process timed
out."},
>>>>                            [{gen_server,terminate,6},
>>>>                             {proc_lib,init_p_do_apply,3}]}},
>>>>                          {ancestors,[<0.15647.0>]},
>>>>                          {messages,[]},
>>>>                          {links,[<0.15650.0>,<0.123.0>]},
>>>>                          {dictionary,[]},
>>>>                          {trap_exit,true},
>>>>                          {status,running},
>>>>                          {heap_size,2584},
>>>>                          {stack_size,24},
>>>>                          {reductions,18059924}],
>>>>                         []]}}
>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15441.0>] Uncaught server
error: {os_process_error, <<"OS process timed out.">>}
>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>] ** Generic server
<0.15650.0> terminating
>>>> ** Last message in was {'EXIT',<0.15648.0>, {os_process_error,"OS process
timed out."}}
>>>> ** When Server state == {file,{file_descriptor,prim_file,{#Port<0.2119>,19}},
47574426987}
>>>> ** Reason for termination ==
>>>> ** {os_process_error,"OS process timed out."}
>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>] {error_report,<0.31.0>,
>>>>                       {<0.15650.0>,crash_report,
>>>>                        [[{initial_call,{couch_file,init,['Argument__1']}},
>>>>                          {pid,<0.15650.0>},
>>>>                          {registered_name,[]},
>>>>                          {error_info,
>>>>                           {exit,
>>>>                            {os_process_error,"OS process timed
out."},
>>>>                            [{gen_server,terminate,6},
>>>>                             {proc_lib,init_p_do_apply,3}]}},
>>>>                          {ancestors,[<0.15648.0>,<0.15647.0>]},
>>>>                          {messages,[{'EXIT',<0.15652.0>,shutdown}]},
>>>>                          {links,[]},
>>>>                          {dictionary,[]},
>>>>                          {trap_exit,true},
>>>>                          {status,running},
>>>>                          {heap_size,2584},
>>>>                          {stack_size,24},
>>>>                          {reductions,27732395236}],
>>>>                         []]}}
>>>>
>>>>
>>>> -Sami
>>>> On Jun 5, 2012, at 12:23 PM, Dave Cottlehuber wrote:
>>>>
>>>>> On 5 June 2012 11:13, Sami Sierla <sami.sierla@poplatek.fi> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> We have a rather large database (about 90 million documents /200GB)
running on CouchDB (1.0.3) and we're now updating it to version 1.2.0 due to view compaction
problems (large view group compactions never finished).
>>>>>>
>>>>>> At the moment we are rebuilding (JavaScript) views with 1.2.0 but
during this we have stumbled upon to new problem : indexer processes suddenly just disappear.
Initially we got "OS Process Timeout" -errors to log but after adjusting os_process_timeout
to 30secs indexing still prematurely stops but without any log entry.
>>>>>>
>>>>>> Any ideas what might cause this behavior?
>>>>>>
>>>>>> CouchDB is running on RHEL 5.8 and is statically linked with SpiderMonkey
1.8.5
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Sami Sierla / Poplatek Oy / Finland
>>>>>
>>>>> Sami,
>>>>>
>>>>> Have you anything useful in the couch.log file? Are you able to run
>>>>> the view generation in debug mode (might not be possible due to disk
>>>>> space constraints  & performance impact).
>>>>>
>>>>> Also, if you query the view with ?limit=1&descending=true you'll
get
>>>>> the last doc that couch successfully processed (I think). Is there
>>>>> anything special about that or the subsequent documents? If you
>>>>> process the view & those docs manually into node or js.exe directly
>>>>> [1] does that work?
>>>>>
>>>>> There's quite a few changes in 1.0.3 -> 1.2.0 including better
>>>>> detection of ill-formed docs amongst others, more info will help
>>>>> narrow this down.
>>>>>
>>>>> A+
>>>>> Dave
>>>>>
>>>>> [1]: http://wiki.apache.org/couchdb/Troubleshooting#Map.2BAC8-Reduce_debugging
>>>>
>

Mime
View raw message